So, lets say I have this line of code
x = 'My name is James Bond'
y = 'My name is James Bond and I am an MI-6 agent stationed in London, UK'
from difflib import SequenceMatcher as sm
sm(None, x, y)
Now, the ratio being returned is 0.47191011235955055, which is fair.
My problem is - x is present in its entirety in y. I was hoping to get a faily high match. Looking at it another way, I am basically looking for some sort of plagiarism detection.
UPDATE: Being more specific. In the above example I'd expected a match of 100% since x is present in y in its entirety. However, that may not be a clear-cut case in every example.
Another example:
x = "My name is James Herbert Bond"
Here x has an extra word, so some matching method would give me a less desirable matching percent (say 90%) since there is only one extra word called "Herbert" in x that is not present in y.