0

Let us say I want to find a similar vector for a vector a = [0 0 2 0 0 0 0 0 0]

I have two candidates:

  • b1 = [0 0 0 2 0 0 0 0 0], where the "feature" is just 1 position away
  • b2 = [0 0 0 0 0 0 0 2 0], where the "feature" is 5 positions away

Euclidean distance for (a, b1) is the same as for (a, b2). What I want is for b1 to get a higher "similarity" score. Is there a well-known method (name it, please) to deal with such problems? Some kind of fuzzy Euclidean distance?

One possible solution I can come up with is to calculate the Euclidean distance for (a, b1) with the whole b1 shifted by 1 position left, then by 2 positions left, by 3 positions left, etc., then do the same for shifting right. Every time I do it, I adjust the calculated Euclidean position by a weight which decreases as the shifting distance increases. The same procedure is then repeated for b2. Then the results are compared to find a better match.

iloo
  • 926
  • 12
  • 26
  • 1
    If you can define what you mean by *similar*, that will determine what your calculation needs to be. As it stands, it's too vague. I'm not aware of a formal, mathematical definition of *similar vectors*. Perhaps if you had more examples, that could help. For example, are `[0,0,0,2]` and `[0,0,0,4]` more similar than `[0,0,2,0]` and `[0,0,0,2]`? If you an answer such questions, but having trouble quantifying, perhaps show a wider variety of examples to help make it clear. – lurker Feb 20 '15 at 12:13

1 Answers1

1

Look at levenstein distance. It operates on strings, to find similarity (edit distance), but when modified to use on vectors it will get you higher similarity to b1 than to b2. It could be modified to compare actual values (not just match/mismatch of character)

lopisan
  • 7,720
  • 3
  • 37
  • 45