2

Given a list of tuples, [(x1, y1), (x2, y2) ... (xm, ym)] such as [(1, 2), (3, 7), (5, 9)] I would like to write a function that fills in the missing integer values x with the average of the neighbor values f(x - 1), f(x + 1).

In this case, we would get:

[(1, 2), (2, ave(2, 7)), (3, 7), (4, ave(7, 9)), (5, 9)]

import numpy as np

# calculating nearest neighbor averages
def nearest(x, y):

# define the min and max for our line
min = np.amin(x)
max = np.amax(x)

# fill in the gaps
numsteps = max - min + 1

# an empty vessel 
new_df = []

# an empty vessel for our xs
xs = np.linspace(min, max, numsteps)

for i, item in enumerate(xs):
    if(xs[i] in x):
        idx = x.index(xs[i])
        new_df.insert(i, (xs[i], y[idx]))
    else:
        idx = x.index(xs[i] - 1)
        idx2 = x.index(xs[i] + 1)
        avg = (y[idx] + y[idx2])/2.0
        new_df.insert(i, (xs[i], avg))

print new_df


nearest([1, 3, 5], [6, 7, 8])

// [(1.0, 6), (2.0, 6.5), (3.0, 7), (4.0, 7.5), (5.0, 8)]

This quickly fails, however, with an array such as xs = [1, 4, 7], since the values are more than one away from each other. In that case, given the same ys = [2, 7, 9], we would expect the answer to either be:

[(1, 2), (2, ave(2, 7)), (3, ave(2,7)), (4, 7) ... ]

or

Something a bit more complicated:

[(1, 2), (2, ave(prev, next_that_exists)), (3, ave(just_created, next_that exists), ...]

How can I implement so that we find the elements just below the missing one and just above the missing one, and compute their average?

Also, is this different from a moving average?

m00am
  • 5,910
  • 11
  • 53
  • 69
compguy24
  • 937
  • 16
  • 33
  • Your indentation is not correct. Please help us by fixing it. – Hai Vu Sep 12 '15 at 17:07
  • 1
    If xs = [1, 4, 7] what do you want the answer to be? – saulspatz Sep 12 '15 at 17:14
  • https://en.wikipedia.org/wiki/Moving_average – saulspatz Sep 12 '15 at 17:15
  • This is actually pretty interesting. In the case of a gap of 2, you basically have `2, a, b, 7` where `a = (2 + b)/2` and `b = (a+7)/2` - two equations with two unknowns. With a gap of 3 you have `2, a, b, c, 7`, with `a = (2+b)/2`, etc... - three equations with three unknowns. I'm trying to come up with an elegant way to solve this, but nothing yet. – Claudiu Sep 12 '15 at 17:55

2 Answers2

1

Here is my approach: from the input, create a dictionary with the first list as the key and the second list as value. Then create a function, get_value() to get the value, calculate it if needed.

def get_value(pairs, key):
    try:
        return pairs[key]
    except KeyError:
        previous_value = get_value(pairs, key -1)
        next_value = get_value(pairs, key + 1)
        return (previous_value + next_value) / 2.0

def nearest(x, y):
    pairs = dict(zip(x, y))
    for i in range(1, max(x) + 1):
        yield i, get_value(pairs, i)

print list(nearest([1, 3, 5], [6, 7, 8]))

Update

I now have a chance to revisit this question. Based on your description, you want to interpolate the missing values. Since you already have numpy installed, why not use it?

import numpy as np

def nearest(x, y):
    all_x = range(min(x), max(x) + 1)
    return zip(all_x, np.interp(all_x, x, y))

print nearest([1, 3, 5], [6, 7, 8])
print nearest([1, 4, 7], [6, 7, 8])

Output:

[(1, 6.0), (2, 6.5), (3, 7.0), (4, 7.5), (5, 8.0)]
[(1, 6.0), (2, 6.333333333333333), (3, 6.666666666666667), (4, 7.0), (5, 7.333333333333333), (6, 7.666666666666667), (7, 8.0)]

The numpy.interp does all the heavy lifting, function nearest only need to figure out a list of all the x values.

Hai Vu
  • 37,849
  • 11
  • 66
  • 93
  • This gives an infinite loop on `nearest([1, 4, 7], [3, 4, 5])` – Claudiu Sep 12 '15 at 17:29
  • Garbage in, garbage out. There is a gap between 1 and 4, and the original poster does not specify how to fill that gap. Now that the OP have posted an update in requirements. I am working on updating my solution. – Hai Vu Sep 12 '15 at 17:33
  • He did underspecify it, but he was asking for code to specifically solve this case - it seems like his code already handles the non-gap cases. – Claudiu Sep 12 '15 at 17:39
1

This should work:

def nearest(x, y):
    assert len(x) == len(y)

    res = []
    for i in xrange(len(x)-1):
        res.append((x[i], y[i]))
        gap = x[i+1] - x[i]
        for j in xrange(1, gap):
            res.append((x[i]+j, y[i] + j * (y[i+1]-y[i]) / float(gap)))
    res.append((x[-1], y[-1]))

    return res

Sample output:

print nearest([1, 3, 5], [2, 7, 9])
print nearest([1, 4, 7], [2, 7, 9])

Gives:

[(1, 2), (2, 4.5), (3, 7), (4, 8.0), (5, 9)]
[(1, 2), (2, 3.666666666666667), (3, 5.333333333333334), (4, 7), (5, 7.666666666666667), (6, 8.333333333333334), (7, 9)]

Explanation:

I solved the [1, 4], [2, 7] case by hand, noting that the values we want are 2, x, y, 7 where

x = (2 + y) / 2
y = (x + 7) / 2

I got x = 11/3 and y = 16/3, yielding:

6/3, 11/3, 16/3, 21/3

Note that the gap between each of these is 5/3, or (7-2) / (4-1). That's when I realized that by wanting to fill in with the average of the neighbor values across larger gaps, you basically want a linear interpolation from one value to the next over the given number of steps. That is, for example, given you want to go from 2 to 7 in 3 steps, you add 5/3 to 2 repeatedly until you get to 7.

Claudiu
  • 224,032
  • 165
  • 485
  • 680