I have n
strings (each has its own size) and the letters are included in finite group - S
(~ 120 letters).
I want to calculate LCS result between each string to another and I want that all results will be normalized.
I want to normalize LCS result between string i
and string j
to avoid both strings's length.
Example:
LCS("shpin","shdek")=2
because ("[sh]pin","[sh]dek") = "sh"
but
LCS("shpxaaaaaaaaaan","shaaaaaaaaaadek")=12
because ("[sh]px[aaaaaaaaaa]n","[shaaaaaaaaaa]dek") = "shaaaaaaaaaa"
I was thinking about dividing each result in Expected Value
but I don't know to calculate EV
.
Does anyone have a solution? maybe another way to get good enough approximation? :(
Thanks