I would like to compare two eye-tracking scanpaths. The eye-tracking results in a sequence of labels that the observer look at, for a division of an image into labeled tiles (rectangular regions). We also know from the eye-tracking at what time, and for how long the eye look at tile N.
The Levenshtein or string edit distance works fine, as long as the timing of the fixations are not taken into account. For example, f user 1 looks at the tiles "AKPLA", and user 2 looks at tiles "ATPLB" the string edit distance will be 2, but user 2 might look at "P" in a much longer time than user 2.
Any ideas of how to improve the distance measure to measure timing differences as well? (note that the algorithm is not restricted to character strings, it works equally well with arrays of integers).