Closed
Description
It seems that the algorithm longestCommonSubstring does not handle unicode characters properly:
longestCommonSubstr('𐌵𐌵**ABC', '𐌵𐌵--ABC') === '𐌵𐌵'
// whereas the longest one should be ABC (in terms of number of code points)
// Number of code points:
[...'𐌵𐌵'].length === 2
[...'ABC'].length === 3
// Number of "characters":
'𐌵𐌵'.length === 4
'ABC'.length === 3
You should maybe add a note on the algorithm regarding this. Basically the problem can occur whenever the strings contain characters outside the BMP range (ie code points greater than 0xffff).
Feel free to close the issue whenever you want. The aim was just to signal the problem is case you want to patch it in a way.