I've sequences builded from 0's and 1's. I want to somehow measure their distance from target string. But target string is incomplete.
Example of data I have, where x is target string, where [0] means the occurance of at least one '0'
:
x =11[0]1111[0]1111111[0]1[0]`, the length of x is fixed and eaquel to length of y.
y1=11110111111000000101010110101010111
y2=01101000011100001101010101101010010
all y's have the same length
it's easy to see that x
could be indeed interpreted as set of strings, but this set could be very large, mayby simply I need to sample from that set and take average of minimum edit distances, but again it's too big computional problem.
I've tried to figure out algo, but I'm stacked, it steps look like this : x - target string - fuzzy one,
y - second string - fixed Cx1, Cy1 - numbers of ones in x and y Gx1, Gy1 - lists of vectors, length of each list is equal to number of groups of ones in given sequence,
Gx1[i] i-th vector,
Gx1[i]=(first one in i-th group of ones, length of i-th group of ones)
if lengths of Gx1 and Gy1 are the same then we know how many ones to add or remove from each group, but there's a problem, because I don't know if simple adding and removing gives minimum distance