To expand on my earlier comment, using the patienceDiff / patienceDiffPlus algorithm ( see https://github.com/jonTrent/PatienceDiff ) might be a good fit for your situation, as the patienceDiff algorithm is generally good for highlighting the deltas between two strings that are very similar with only some minor differences. The algorithm in your case can be used as follows, with the first step to remove the commas and split the sentences into arrays of words...
var str1 = "I like this soup because it is very tasty, like the one that my grandma used to make";
var str2 = "I really lie this soup, it is very tasty, like the one that my grandma use to make";
let a = str1.split( ',' ).join( '' ).split( ' ');
let b = str2.split( ',' ).join( '' ).split( ' ');
let pdp = patienceDiffPlus( a, b )
console.log( pdp );
...results in...
Object
lineCountDeleted: 3
lineCountInserted: 3
lineCountMoved: 0
lines: Array(21)
0: {line: "I", aIndex: 0, bIndex: 0}
1: {line: "like", aIndex: 1, bIndex: -1}
2: {line: "really", aIndex: -1, bIndex: 1}
3: {line: "lie", aIndex: -1, bIndex: 2}
4: {line: "this", aIndex: 2, bIndex: 3}
5: {line: "soup", aIndex: 3, bIndex: 4}
6: {line: "because", aIndex: 4, bIndex: -1}
7: {line: "it", aIndex: 5, bIndex: 5}
8: {line: "is", aIndex: 6, bIndex: 6}
9: {line: "very", aIndex: 7, bIndex: 7}
10: {line: "tasty", aIndex: 8, bIndex: 8}
11: {line: "like", aIndex: 9, bIndex: 9}
12: {line: "the", aIndex: 10, bIndex: 10}
13: {line: "one", aIndex: 11, bIndex: 11}
14: {line: "that", aIndex: 12, bIndex: 12}
15: {line: "my", aIndex: 13, bIndex: 13}
16: {line: "grandma", aIndex: 14, bIndex: 14}
17: {line: "used", aIndex: 15, bIndex: -1}
18: {line: "use", aIndex: -1, bIndex: 15}
19: {line: "to", aIndex: 16, bIndex: 16}
20: {line: "make", aIndex: 17, bIndex: 17}
length: 21
...where:
- If aIndex = -1 then the
a
array did not have a corresponding value in the b
array.
- If bIndex = -1 then the
b
array did not have a corresponding value in the a
array.
- If aIndex and bIndex are both positive, then a match was found at the corresponding indexes of the arrays.
Also note that if you perform a patienceDiff
character-by-character, that is, splitting the sentences into arrays of characters...
let a = str1.split( '' );
let a = str2.split( '' );
let pdp = patienceDiff( a, b )
console.log( pdp );
...then the result will be...
0: {line: "I", aIndex: 0, bIndex: 0}
1: {line: " ", aIndex: 1, bIndex: 1}
2: {line: "r", aIndex: -1, bIndex: 2}
3: {line: "e", aIndex: -1, bIndex: 3}
4: {line: "a", aIndex: -1, bIndex: 4}
5: {line: "l", aIndex: -1, bIndex: 5}
6: {line: "l", aIndex: -1, bIndex: 6}
7: {line: "y", aIndex: -1, bIndex: 7}
8: {line: " ", aIndex: -1, bIndex: 8}
9: {line: "l", aIndex: 2, bIndex: 9}
10: {line: "i", aIndex: 3, bIndex: 10}
11: {line: "k", aIndex: 4, bIndex: -1}
12: {line: "e", aIndex: 5, bIndex: 11}
13: {line: " ", aIndex: 6, bIndex: 12}
14: {line: "t", aIndex: 7, bIndex: 13}
15: {line: "h", aIndex: 8, bIndex: 14}
16: {line: "i", aIndex: 9, bIndex: 15}
17: {line: "s", aIndex: 10, bIndex: 16}
18: {line: " ", aIndex: 11, bIndex: 17}
o
o
o
84: {line: " ", aIndex: 76, bIndex: 74}
85: {line: "t", aIndex: 77, bIndex: 75}
86: {line: "o", aIndex: 78, bIndex: 76}
87: {line: " ", aIndex: 79, bIndex: 77}
88: {line: "m", aIndex: 80, bIndex: 78}
89: {line: "a", aIndex: 81, bIndex: 79}
90: {line: "k", aIndex: 82, bIndex: 80}
91: {line: "e", aIndex: 83, bIndex: 81}
...which shows the addition of the word 'really' in the b
array, and also that the 'k' is missing in the b
array within the word like
. Employing the patienceDiff algorithm character-by-character might suit your needs better, depending on the level with which you wish to match the words.