1

I'm trying to do the first exercise on scikit-learn, but even when I run their solution code (shown below) I get the error in the code block immediately following. Does anyone know why this is happening? How can I resolve this?

The predict method also fails when trying to use this dataset, for some reason it seems to work fine for the iris dataset using the code at the very bottom of the question. sorry if I am missing something very obvious, I am not an actual programmer.

Traceback (most recent call last):
  File "C:\Users\user2491873\Desktop\scikit_exercise.py", line 30, in <module>
    print(knn.fit(X_train, y_train).score(X_test, y_test))
  File "C:\Python33\lib\site-packages\sklearn\base.py", line 279, in score
    return accuracy_score(y, self.predict(X))
  File "C:\Python33\lib\site-packages\sklearn\neighbors\classification.py", line 131,     in predict
    neigh_dist, neigh_ind = self.kneighbors(X)
  File "C:\Python33\lib\site-packages\sklearn\neighbors\base.py", line 254, in kneighbors
warn_equidistant()
  File "C:\Python33\lib\site-packages\sklearn\neighbors\base.py", line 33, in warn_equidistant
    warnings.warn(msg, NeighborsWarning, stacklevel=3)
  File "C:\Python33\lib\idlelib\PyShell.py", line 59, in idle_showwarning
file.write(warnings.formatwarning(message, category, filename,
AttributeError: 'NoneType' object has no attribute 'write'

here is the code:

"""
================================
Digits Classification Exercise
================================

This exercise is used in the :ref:`clf_tut` part of the
:ref:`supervised_learning_tut` section of the
:ref:`stat_learn_tut_index`.
"""

from sklearn import datasets, neighbors, linear_model

digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target

n_samples = len(X_digits)

X_train = X_digits[:.9 * n_samples]
y_train = y_digits[:.9 * n_samples]
X_test = X_digits[.9 * n_samples:]
y_test = y_digits[.9 * n_samples:]

knn = neighbors.KNeighborsClassifier()
logistic = linear_model.LogisticRegression()

print('KNN score: %f' % knn.fit(X_train, y_train).score(X_test, y_test))\
print('LogisticRegression score: %f'
      % logistic.fit(X_train, y_train).score(X_test, y_test))

This is the code for the Iris dataset which seems to work fine...

import numpy as np
>>> from sklearn import datasets
>>> iris = datasets.load_iris()
>>> iris_X = iris.data
>>> iris_y = iris.target
>>> np.unique(iris_y)
array([0, 1, 2])

>>> # Split iris data in train and test data
>>> # A random permutation, to split the data randomly
>>> np.random.seed(0)
>>> indices = np.random.permutation(len(iris_X))
>>> iris_X_train = iris_X[indices[:-10]]
>>> iris_y_train = iris_y[indices[:-10]]
>>> iris_X_test  = iris_X[indices[-10:]]
>>> iris_y_test  = iris_y[indices[-10:]]
>>> # Create and fit a nearest-neighbor classifier
>>> from sklearn.neighbors import KNeighborsClassifier
>>> knn = KNeighborsClassifier()
>>> knn.fit(iris_X_train, iris_y_train)
KNeighborsClassifier(algorithm='auto', leaf_size=30, n_neighbors=5, p=2,
           warn_on_equidistant=True, weights='uniform')
>>> knn.predict(iris_X_test)
array([1, 2, 1, 0, 0, 0, 2, 1, 2, 0])
>>> iris_y_test
array([1, 1, 1, 0, 0, 0, 2, 1, 2, 0])    
user2491873
  • 11
  • 1
  • 1
  • 2

1 Answers1

6

If you read the traceback message it means that the variable file in the expression file.write(warnings.formatwarning(message, category, filename, ...) is set to None instead of the expected channel (for instance the standard output of the program or a buffer in the user interface).

This means that this is probably a bug in IDLE. If you google the error message you will get:

http://bugs.python.org/issue18030

which in turn points to:

http://bugs.python.org/issue13582

So this bug is indeed not related to scikit-learn. I would suggest you to:

  • either launch IDLE from the cmd console by typing python -m idlelib.idle

  • or use a different Python IDE / environment.

ogrisel
  • 39,309
  • 12
  • 116
  • 125
  • Thanks ogrisel, launching the IDLE from cmd solved the problem. I faced this problem when trying to plot a graph using matplotlib and pandas. Thanks again! – Mohammed Jan 04 '15 at 16:33