0

I am going to use scikit-learn libraries for my SVM implementation for classification.

My features' values are 0/1 and I have saved these values in a txt file for features and a separate txt file for my labels.

Now my problem is that how I can load my external data set for training and test phase using scikit-learn?

Stateless
  • 293
  • 2
  • 4
  • 18
  • 1
    Check out the docs of numpy or pandas. Both got functions for reading csv-files. If your files are not really csv-like, you have to parse them yourself. You won't get much more help as all the details are missing. – sascha Jan 30 '17 at 15:57

1 Answers1

2

Saving vectorized and especially compressed (sparse) data in a TXT/CSV file is not the best approach as you might have problems when reading it back - you will lose dtypes, compression/"sparseness", etc.. You may even encounter cases when you will not be able to read your TXT/CSV file in memory.

Here you can see an example when converting sparse matrix to a normal (numpy) one ends with MemoryError. It may happen to you if you will save your sparse (compressed) matrix to CSV and then will try to read it back (uncompressed).

So i would recommend you to use pickling:

saving / serializing your data:

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl') 

where clf is your trained model or another sparse/compressed data structure

reading it back from disk:

from sklearn.externals import joblib
clf = joblib.load('filename.pkl') 
Community
  • 1
  • 1
MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419