Python: Find list that most closely matches input list value by value

Question

I have a given list of values and a collection of lists (lists A, B, and C) with similar values. I'm trying to find a way to return the list that most closely matches the given list. I'd like to use a least squares fit as the distance metric.

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]

So in this case, it would return C as the closest match to given.

I thought I could incorporate something like:

match = [min([val[idx] for val in [A,B,C]], key=lambda x: abs(x-given[idx])) for idx in range(len(given))]

But that only returns the closest value for each list element. I'm not sure how to then identify list C as the closest point-by-point match.

Also, if the lists are different lengths, I really don't know what to do if I'm not comparing them index by index. For example:

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 2, 5.1, 3, 6.8, 7.1, 8.2, 9]
B = [-0.1, 0.9, 2.1, 3.1, 3.9]
C = [-1.7, -1, 0, 1.1, 2, 2.9, 4, 5.1, 6, 7.1, 8]

would still return C as the closest match.

I'm also using Numpy but haven't found anything useful. Any help would be greatly appreciated!

I think you should begin by formalizing the required distance metric. In other words, what is it *exactly* that makes `given` closer to `C` than to `A` or `B`? Without this, the question is too vague to be answerable. — NPE, Nov 26 '12 at 15:25
See this question on SO: http://stackoverflow.com/questions/9365184/computing-similarity-between-two-lists — asthasr, Nov 26 '12 at 15:26
@NPE is right. I agree, some distance metric should be selected. — crow16384, Nov 26 '12 at 15:27

score 1 · Answer 1 · answered Nov 26 '12 at 15:28

1

The pure python solution isn't most efficient, but here's one implementation using least squares for the distance metric.

def distance(x,y):
    return sum( (a-b)**2 for a,b in zip(x,y) )

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]

min((A,B,C),key=lambda x:distance(x,given))

Assuming np.ndarrays of the same size, distance could be written as:

def distance(x,y):
    return ((x-y)**2).sum()

answered Nov 26 '12 at 15:28

mgilson

300,191
65
633
696

@JoeFlip -- Yes. It starts from the beginning of the list. I really don't know how you want to handle sequences which have un-equal lengths, but `itertools.izip_longest` *might* be useful for that case (instead of my `zip` above). – mgilson Nov 26 '12 at 15:39

score 1 · Accepted Answer · answered Nov 26 '12 at 15:42

You can use the sum of the squared errors. I made a quick example:

from copy import copy

def squaredError(a, b):
    r = copy(a)

    for i in range(len(a)):
        r[i] -= b[i]
        r[i] *= r[i]

    return sum(r)

given = [0, 1, 2, 3, 4, 5]
A = [0.1, 0.9, 2, 3.3, 3.6, 5.1]
B = [-0.1, 0.9, 2.1, 3.1, 3.9, 5]
C = [0, 1.1, 2, 2.9, 4, 5.1]

print squaredError(given, A)
print squaredError(given, B)
print squaredError(given, C)

match = min(map(lambda x: (squaredError(given, x), x), [A,B,C]))[1]
print match

Perfect! This works for lists of different lengths as well. Thanks so much! — Joe Flip, Nov 26 '12 at 16:14

Python: Find list that most closely matches input list value by value

2 Answers2