0

I am using a custom-defined metric in SKlearn's KNeighborsClassifier. Here's my code:

def chi_squared(x,y):
return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))

Above function implementation of chi squared distance function. I have used NumPy functions because according to scikit-learn docs, metric function takes two one-dimensional numpy arrays.

I have passed the chi_squared function as an argument to KNeighborsClassifier().

knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)

However, I keep getting following error:

TypeError                                 Traceback (most recent call last)
<ipython-input-29-d2a365ebb538> in <module>
      4 
      5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)
----> 6 knn.fit(X_train, Y_train)
      7 predictions = knn.predict(X_test)
      8 print(accuracy_score(Y_test, predictions))

~/.local/lib/python3.8/site-packages/sklearn/neighbors/_classification.py in fit(self, X, y)
    177             The fitted k-nearest neighbors classifier.
    178         """
--> 179         return self._fit(X, y)
    180 
    181     def predict(self, X):

~/.local/lib/python3.8/site-packages/sklearn/neighbors/_base.py in _fit(self, X, y)
    497 
    498         if self._fit_method == 'ball_tree':
--> 499             self._tree = BallTree(X, self.leaf_size,
    500                                   metric=self.effective_metric_,
    501                                   **self.effective_metric_params_)

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.__init__()

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree._recursive_build()

sklearn/neighbors/_ball_tree.pyx in sklearn.neighbors._ball_tree.init_node()

sklearn/neighbors/_binary_tree.pxi in sklearn.neighbors._ball_tree.BinaryTree.rdist()

sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.DistanceMetric.rdist()

sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance.dist()

sklearn/neighbors/_dist_metrics.pyx in sklearn.neighbors._dist_metrics.PyFuncDistance._dist()

<ipython-input-29-d2a365ebb538> in chi_squared(x, y)
      1 def chi_squared(x,y):
----> 2     return np.divide(np.square(np.subtract(x,y)), np.sum(x,y))
      3 
      4 
      5 knn = KNeighborsClassifier(algorithm='ball_tree', metric=chi_squared)

<__array_function__ internals> in sum(*args, **kwargs)

~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in sum(a, axis, dtype, out, keepdims, initial, where)
   2239         return res
   2240 
-> 2241     return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
   2242                           initial=initial, where=where)
   2243 

~/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs)
     85                 return reduction(axis=axis, out=out, **passkwargs)
     86 
---> 87     return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
     88 
     89 

TypeError: only integer scalar arrays can be converted to a scalar index

   

1 Answers1

0

I can reproduce your error message with:

In [173]: x=np.arange(3); y=np.array([2,3,4])
In [174]: np.sum(x,y)
Traceback (most recent call last):
  File "<ipython-input-174-1a1a267ebd82>", line 1, in <module>
    np.sum(x,y)
  File "<__array_function__ internals>", line 5, in sum
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 2247, in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
  File "/usr/local/lib/python3.8/dist-packages/numpy/core/fromnumeric.py", line 87, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: only integer scalar arrays can be converted to a scalar index

Correct use(s) of np.sum:

In [175]: np.sum(x)
Out[175]: 3
In [177]: np.sum(np.arange(6).reshape(2,3), axis=0)
Out[177]: array([3, 5, 7])
In [178]: np.sum(np.arange(6).reshape(2,3), 0)
Out[178]: array([3, 5, 7])

(re)read the np.sum docs if necessary!

Using np.add instead of np.sum:

In [179]: np.add(x,y)
Out[179]: array([2, 4, 6])
In [180]: x+y
Out[180]: array([2, 4, 6])

The following should be equivalent:

np.divide(np.square(np.subtract(x,y)), np.add(x,y))

(x-y)**2/(x+y)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • x and y are here one dimensional arrays. Can I use np.add(x,y) to add the respective values of both arrays? – Mushahid Hussain Mar 27 '21 at 02:34
  • Have you tried? have you looked up the docs? See my edit? – hpaulj Mar 27 '21 at 02:37
  • Yes, I did read the np.sum docs and it seems to accept only one numpy array and calculate sum of all of its elements. However, I want to add respective elements of x and y(as required by chi_squared distance formula). But I [this error](https://pastebin.pl/view/8efa28cc). I tried converting the result to float, but the error won't go away – Mushahid Hussain Mar 27 '21 at 02:44
  • `np.sum` takes ONE array and sums up the elements. A second argument is the `axis` (or axes) on which to do this. The error is caused by `y` not qualifying as this axis parameter. – hpaulj Mar 27 '21 at 03:25
  • I can't load that error message. Look again at my edits. Also, notice the In/Out lines in my answer? I copied those from an interactive `numpy` session (`ipython`). You should have an session like that open when you write and test code. If you test code pieces there first, it's much more likely that the bigger stript(s) will work. – hpaulj Mar 27 '21 at 03:44