I was trying out python's difflib module and I came across SequenceMatcher
. So, I tried the following examples but couldn't understand what is happening.
>>> SequenceMatcher(None,"abc","a").ratio()
0.5
>>> SequenceMatcher(None,"aabc","a").ratio()
0.4
>>> SequenceMatcher(None,"aabc","aa").ratio()
0.6666666666666666
Now, according to the ratio:
Return a measure of the sequences' similarity as a float in the range [0, 1]. Where
T
is the total number of elements in both sequences, andM
is the number of matches, this is2.0*M / T
.
so, for my cases:
T=4
andM=1
so ratio2*1/4 = 0.5
T=5
andM=2
so ratio2*2/5 = 0.8
T=6
andM=1
so ratio2*1/6.0 = 0.33
According to my understanding T = len(aabc) + len(a)
and M=2
because a
comes twice in aabc
.
So, where am I getting wrong what am I missing.?
Here is the source code of SequenceMatcher.ratio()