How load a data set contains txt file in scikit-learn

Question

I am going to use scikit-learn libraries for my SVM implementation for classification.

My features' values are 0/1 and I have saved these values in a txt file for features and a separate txt file for my labels.

Now my problem is that how I can load my external data set for training and test phase using scikit-learn?

Check out the docs of numpy or pandas. Both got functions for reading csv-files. If your files are not really csv-like, you have to parse them yourself. You won't get much more help as all the details are missing. — sascha, Jan 30 '17 at 15:57

score 2 · Accepted Answer · edited May 23 '17 at 12:18

2

Saving vectorized and especially compressed (sparse) data in a TXT/CSV file is not the best approach as you might have problems when reading it back - you will lose dtypes, compression/"sparseness", etc.. You may even encounter cases when you will not be able to read your TXT/CSV file in memory.

Here you can see an example when converting sparse matrix to a normal (numpy) one ends with MemoryError. It may happen to you if you will save your sparse (compressed) matrix to CSV and then will try to read it back (uncompressed).

So i would recommend you to use pickling:

saving / serializing your data:

from sklearn.externals import joblib
joblib.dump(clf, 'filename.pkl')

where clf is your trained model or another sparse/compressed data structure

reading it back from disk:

from sklearn.externals import joblib
clf = joblib.load('filename.pkl')

edited May 23 '17 at 12:18

Community

1
1

answered Jan 30 '17 at 16:10

MaxU - stand with Ukraine

205,989
36
386
419

Thanks for your answer and your illustration. I am going to try your instructions. – Stateless Jan 30 '17 at 16:20
@Shahrooz, did it help? – MaxU - stand with Ukraine Jan 31 '17 at 23:30
yes and thanks for your help. I just do not know how can I set gama and c parameters in my classifier. Do I must set these parameters manually for every run or they will be set automatically? If they will be set automatically, how can I find their values? – Stateless Feb 01 '17 at 16:05

How load a data set contains txt file in scikit-learn

1 Answers1