1

Let's say I have sampled some signals and constucted a vector of the samples for each. What is the most efficent way to calculate (dis)similarity of those vectors? Note that offset of the sampling must not count, for instance sample-vectors of sin and cos -signals should be considered similar since in sequential manner they are exately the same.

There is a simple way of doing this by "rolling" the units of the other vector, calculating euclidian distance for each roll-point and finally choosing the best match (smallest distance). This solution works fine since the only target for me is to find most similar sample-vector for input signal from a vector pool.

However, the solution above is also very inefficent when the dimension of the vectors grow. Compared to "non-sequential vector matching" for N-dimensional vector, the sequential one would have N-times more vector distance calculations to do.

Is there any higher/better mathematics/algorithms to compare two sequences with differing offsets?

Use case for this would be in sequence similarity visualization with SOM.

EDIT: How about comparing each vector's integrals and entropies? Both of them are "sequence-safe" (= time-invariant?) and very fast to calculate but I doubt they alone are enough to distinguish all possible signals from each other. Is there something else that could be used in addition for these?

EDIT2: Victor Zamanian's reply isn't directly the answer but it gave me an idea that might be. The solution might be to sample the original signals by calculating their Fourier transform coefficents and inserting those into sample vectors. First element (X_0) is the mean or "level" of the signal and the following (X_n) can be directly used to compare similarity with some other sample vector. The smaller the n is, the more it should have effect in similarity calculations, since the more coefficents there has been calculated with FT, the more accurate representation will the FT'd signal be. This brings up an bonus question:

Let's say we have FT-6 sampled vectors (values just fell out of the sky)

  • X = {4, 15, 10, 8, 11, 7}
  • Y = {4, 16, 9, 15, 62, 7}

Similarity value of these vectors could MAYBE be calculated like this: |16-15| + (|10 - 9| / 2 ) + (|8 - 15| / 3) + (|11-62| / 4) + (|7-7| / 5)

Those bolded ones are the bonus question. Is there some coefficents/some other way to know how much each FT-coefficent has effect on the similarity in relation to other coefficents?

Simo Erkinheimo
  • 1,347
  • 9
  • 17
  • I may have misunderstood the last part but, in my opinion, those emboldened denominators do not instinctively represent a dissimilarity between the two signals in the Frequency Domain. FT-coefficient |C_k| is merely the amplitude of the frequency at k. Perhaps you could take the difference between the frequency amplitudes themselves in the two DFTs? And store that in a new vector? Maybe even summing the values in that vector? It may or may not depend on what kind of dissimilarities you are interested in. But I am certainly no expert! Don't take my word as truth! :-) – Victor Zamanian Dec 31 '12 at 00:00

2 Answers2

1

If I understand your question correctly, maybe you would be interested in some type of cross-correlation implementation? I'm not sure if it's the most efficient thing to do or fits the purpose, but I thought I would mention it since it seems relevant.

Edit: Maybe a Fast Fourier Transform (FFT) could be an option? Fourier transforms are great for distinguishing signals from each other and I believe helpful to find similar signals too. E.g. a sine and a cosine wave would be identical in the real plane, and just have different imaginary parts (phase). FFTs can be done in O(N log N).

Victor Zamanian
  • 3,100
  • 24
  • 31
  • Thank you for the reply! Taken from wikipedia "The formula essentially slides the g function along the x-axis, calculating the integral of their product at each position." So this method does the same "sliding" as my solution but with even more complex calculations. So this approach is not yet a better solution. – Simo Erkinheimo Dec 30 '12 at 00:25
  • Oh okay. Figured it was worth a shot. :) I did see some results for stuff like "fast cross-correlation in matlab" and similar things. So maybe there are optimized implementations available. Sorry I couldn't be of more help! – Victor Zamanian Dec 30 '12 at 03:51
  • THANK YOU! Now I got it! I can sample all signals by finding their Fourier -coefficents. Obtained sample vectors will automatically be time-invariant and their similarity will be trivial to calculate! This is exately the kind of "Heureka" I was looking for! Thank you! – Simo Erkinheimo Dec 30 '12 at 11:51
1

Google "translation invariant signal classificiation" and you'll find things like these.

davin
  • 44,863
  • 9
  • 78
  • 78
  • Thank you! Darvishi's method is also calculating similarity each shift, so it's not what I'm looking for. Xiong et al's seems quite complex but I'll look into it. Nevertheless I got some nice terminology to google with :) – Simo Erkinheimo Dec 30 '12 at 01:18