0

I am trying to implement a knn 1D estimate:

# nearest neighbors estimate
def nearest_n(x, k, data):
    # Order dataset
    #data = np.sort(data, kind='mergesort')
    nnb = []
    # iterate over all data and get k nearest neighbours around x
    for n in data:
        if nnb.__len__()<k:
            nnb.append(n)
        else:
            for nb in np.arange(0,k):
                if np.abs(x-n) < np.abs(x-nnb[nb]):
                    nnb[nb] = n
                    break

    nnb = np.array(nnb)
    # get volume(distance) v of k nearest neighbours around x
    v = nnb.max() - nnb.min()
    v = k/(data.__len__()*v)

    return v

interval = np.arange(-4.0, 8.0, 0.1)
plt.figure()
for k in (2,8,35):
    plt.plot(interval, nearest_n(interval, k,train_data), label=str(o))
plt.legend()
plt.show()

Which throws:

  File "x", line 55, in nearest_n
    if np.abs(x-n) < np.abs(x-nnb[nb]):
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I know the error comes from the array input in plot(), but I am not sure how to avoid this in a function with operators >/==/<

'data' comes from a 1D txt file containing floats.

I tried using vectorize:

nearest_n = np.vectorize(nearest_n)

which results in:

line 50, in nearest_n
    for n in data:
TypeError: 'numpy.float64' object is not iterable

Here is an example, let's say:

data = [0.5,1.7,2.3,1.2,0.2,2.2]
k = 2

nearest_n(1.5) should then lead to

nbb=[1.2,1.7]
v = 0.5 

and return 2/(6*0.5) = 2/3

The function runs for example neares_n(2.0,4,data) and gives 0.0741586011463

nik.yan
  • 71
  • 1
  • 11
  • 1
    Could you include the expected output (if you have to do it by hand you probably want to use a smaller input). :) – MSeifert Jun 03 '17 at 15:15
  • Output would be 3 different probability density plots (k=2,8,35), s.th. every value from from the array [-4,8] would get mapped to a probability [0,1] – nik.yan Jun 03 '17 at 15:24
  • No, I meant the literal result of a call to `nearest_n`. For example what should `nearest_n(np.arange(-4.0, 8.0, 0.1), 2, np.array([1, 2, 3]))` return? I've chosen the values more or less randomly, insert more appropriate ones if needed (or easier to calculate by hand if you have no reference implementation). – MSeifert Jun 03 '17 at 15:27
  • ok just did that :) – nik.yan Jun 03 '17 at 15:37

2 Answers2

0

You're passing in np.arange(-4, 8, .01) as your x, which is an array of values. So x - n is an array of the same length as x, in this case 120 elements, since subtraction of an array and a scalar does element-wise subtraction. Same with nnb[nb]. So the result of your comparison there is a 120-length array with boolean values depending on whether each element of np.abs(x-n) is less than the corresponding element of np.abs(x-nnb[nb]). This can't be directly used as a conditional, you would need to coalesce these values to a single boolean (using all(), any(), or simply rethinking your code).

spruceb
  • 621
  • 5
  • 12
0
plt.figure()
X = np.arange(-4.0,8.0,0.1)
for k in [2,8,35]:
    Y = []
    for n in X:
        Y.append(nearest_n(n,k,train_data))
    plt.plot(X,Y,label=str(k))
plt.show()

is working fine. I thought pyplot.plot would do this exact thing for me already, but I guess it does not...

nik.yan
  • 71
  • 1
  • 11
  • It's not a matter of `pyplot`, and I'm not sure why you think it could be?. You wrote `nearest_n` to take a scalar for the `x` argument, so you can't pass in a vector without rewriting your code. Here you're looping through a vector and passing a scalar into your function each time. – spruceb Jun 03 '17 at 16:23
  • I thought pyplot would handle vector inputs just like this but I was wrong – nik.yan Jun 03 '17 at 16:33
  • I just want to clarify, because I'm not sure you understood the source of problem. The error was not thrown in the `plt.plot` function and wasn't because of your inputs to `pyplot`, the error was thrown in `nearest_n` and was due to the arguments passed to that function. – spruceb Jun 03 '17 at 17:00
  • @spruceb You can be pretty sure that OP did not understand the problem, also see [his previous question](https://stackoverflow.com/questions/44344542/pyplot-of-array-with-an-operator-or), which had two answers before he asked this one. – ImportanceOfBeingErnest Jun 03 '17 at 18:10