0

I have a simple Python code (digit recognition exercise from Kaggle), which runs fine if I execute it from the command line (I use Windows 8.1 64-bit with Enthought Canopy 1.4.1).

import numpy
from sklearn.ensemble import RandomForestClassifier
from sklearn import cross_validation

print "\nreading training data..."
dataFilename = "D:\\Kaggle\\Digit Recognizer\\Data\\train.csv"
dataFile = open(dataFilename, 'r')
data = numpy.array([map(int, line.replace('\n', '').split(',')) for line in dataFile.readlines()[1:]])
dataFile.close()

print "\nseparating training data to features and targets..."
# use all data to train the algorithm
trainingSet_Y = data[:, 0]
trainingSet_X = data[:, 1:]

print "\ntraining a classifier..."
classify_RF = RandomForestClassifier(n_estimators = 100, n_jobs = -1)
classify_RF.fit(trainingSet_X, trainingSet_Y)

print "\ncalculating cross-validation score..."
scores = cross_validation.cross_val_score(classify_RF, trainingSet_X, trainingSet_Y, cv = 5, n_jobs = -1)
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), numpy.std(scores) * 2))

I decided to migrate all my developments to Visual Studio, so I installed Python Tools for Visual Studio 2.1 to start coding/running directly from within VS Community 2013. (Note: order of installation: (1) Canopy, (2) VS2013, and (3) PTVS.)

However, the same code behaves extremely strange when executed from within VS2013. It runs until the cross-validation step, and then it starts to loop over the code and rerun everything over and over, sometime spitting out error messages on its way, as shown below: PTVS_VS2013_loopOverCode_error

As you can see, once it reaches the cross-validation step, it starts over from the beginning, and goes over the code randomly to execute only certain parts of it!

Any ideas?

cchamberlain
  • 17,444
  • 7
  • 59
  • 72
darXider
  • 447
  • 5
  • 16
  • ok, i figured out that the error is due to `n_jobs = -1` parameter in the cross-validation step. i have removed it, and it's working fine now. – darXider Jul 12 '15 at 04:55
  • Can you still file a bug on https://github.com/Microsoft/ptvs/ for this? We are prioritizing numpy, scipy, pandas and scikit-learn for the next release, and this kind of thing should really just work with no fiddling. – Pavel Minaev Jul 12 '15 at 05:00

0 Answers0