-1

I have a list of floats that I want to compare to other lists and get the similarity ratio in python :

The list that I want to compare:

[0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001]

One of the other lists:

[0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000]

I tried converting them to strings and using fuzzywyzzy library, python-Levenshtein and difflib to compare the strings and get a ratio, but this does not give me the results that I want and they are very slow. I searched and can't find anything about this.

What is the best way to compare 2 lists of floats ?

I am asking to know whether there is a native way to compare float lists for similarity or a library that does the job, like the many examples of string comparison.

Elyes Lounissi
  • 405
  • 3
  • 12
  • What is the expected output in this specific case? Also, when are two numbers considered similar? How do you measure similarity? – Riccardo Bucco May 31 '21 at 15:12
  • the expected output is a number between 0 and 100, or between 0 and 1. 100 means identical and 0 means completely different. – Elyes Lounissi May 31 '21 at 15:13
  • 0.0001 and 0.0002 are more similar than 0.0001 and 0.0005 in the first element comparison for example and so on all elements will need to be compared, and a score needs to be output, I am sure that there is a library or way to do this comparison to see if a list of floats is similar to another list of floats. but I can't find anything. – Elyes Lounissi May 31 '21 at 15:16
  • You need to specify what 0% and 100% diffrence mean, for example: whcih difference is 0.1 to 0.2 in percent for you? what is 0.1 to 100? In what case would there be a difference in percent of 0%? What if one numbers tends to infinity? – Andreas May 31 '21 at 15:17
  • [0.0001,0.0004] and [0.0001,0.0004] are 100% similar. [0.0001,0.0004] and [0.0001,0.006] are not 100% similar, they are maybe 70% similar, while [0.0001,0.0004] and [0.0001,0.0009] are less similar. i don't know how to exactly calculate the difference but someone must have made a library for this, comparing float lists. similar to what you can do with strings. – Elyes Lounissi May 31 '21 at 15:20
  • I don't know why someone downvoted my question, I am here to just see if there is a way or library to natively compare float lists. This is a valid question that has no answer. – Elyes Lounissi May 31 '21 at 15:21
  • 3
    The most likely reason why your question was downvoted is that you cannot clearly define your problem statement. You would need to provide a numerical metric of what similar in your case means since "similar" isn't a well-define mathematical concept in this case. – C Hecht May 31 '21 at 15:30
  • Thank you for clarifying @CHecht , I am not very good in mathematics nor statistics, I saw that there are many algorithms for calculating similarity between strings, they output numbers between 0 and 1. I am looking for something similar but for float number sequences. – Elyes Lounissi May 31 '21 at 15:34
  • I believe one needs to have a clear definition of the metric before searching for a library that does 'this'. We cannot help you find this library, because we don't know what it should do. Probably there are lots of libraries that compare floats in different ways. Which one you want is the thing we can't know for sure. – MatBBastos May 31 '21 at 15:35
  • For someone 'to have made a library for this', most certainly there is a reference to the mathematical basis in order to achieve such comparison, and its meaning, if any. This is what needs definition here, I believe – MatBBastos May 31 '21 at 15:36
  • @MatheusB.Bastos is right, that is why usually you are asked to provide an [mre] which also includes an expected output. The problem is that you can't provide an expected output for the above input, because you yourelf are not sure what you are looking for. – Andreas May 31 '21 at 15:37
  • Take a look at [What is Correlation](https://machinelearningmastery.com/how-to-use-correlation-to-understand-the-relationship-between-variables/) as an example of finding the similarity between two datasets. – DarrylG May 31 '21 at 16:17

1 Answers1

2

The question is no exactly clear in my oppinion, nevertheless you could see if the following approach helps you:

import numpy as np
l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([0.0000,0.0002,0.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])

mse1 = ((l1 - l2)**2).mean()
# Out[180]: 6.699999999999999e-08

l1 = np.array([0.0000,0.0003,-0.0001,0.0002, 0.0001,0.0003,0.0000,0.0000, -0.0002,0.0002,-0.0002,0.0002, 0.0000,0.0000,-0.0002,0.0000, 0.0000,0.0000,-0.0002,-0.0001])
l2 = np.array([1.0000,1.0002,1.0000,0.0001, 0.0003,0.0005,0.0000,0.0000, 0.0001,0.0003,-0.0001,0.0002, 0.0002,0.0003,-0.0001,0.0002, 0.0002,0.0005,-0.0010,0.0000])

mse2 = ((l1 - l2)**2).mean()
# Out[180]: 0.15000006700000001

mse1 < mse2
# Out[187]: True

You won't get a value between 0 and 1 but you can compare the results, and more identical they are the more they approach 0. mse stands for mean squared error. But there are a lot more metrics which could be relevant to you, like msle, mae, etc.

Andreas
  • 8,694
  • 3
  • 14
  • 38
  • 1
    Thank you, I hope this helps other people as well, as string comparison is a topic well explained but working with lists of numbers is something not very well explained to non math-oriented guys. – Elyes Lounissi May 31 '21 at 15:48
  • 1
    @ElyesLounissi, glad the answer was of some help. Try providing a expected output next time, this will increase your chance of getting more answers. If you like please feel free to also upvote the answer, otherwise: Happy coding! – Andreas May 31 '21 at 15:57