Not able to write the Count Vectorizer vocabulary

Question

I want to save and load the count vectorizer vocabulary.This is my code

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
Cv_vec = cv.fit(X['review'])
X_cv=Cv_vec.transform(X['review']).toarray()
dictionary_filepath='CV_dict'
pickle.dump(Cv_vec.vocabulary_, open(dictionary_filepath, 'w'))

It shows me

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-407-3a9b06f969a9> in <module>()
      1 dictionary_filepath='CV_dict'
----> 2 pickle.dump(Cv_vec.vocabulary_, open(dictionary_filepath, 'w'))

TypeError: write() argument must be str, not bytes

I want to save the vocabulary of the count vectorizer and load it.Can anyone help me with it please?.

Possible duplicate of [Using pickle.dump - TypeError: must be str, not bytes](https://stackoverflow.com/questions/13906623/using-pickle-dump-typeerror-must-be-str-not-bytes) — petezurich, Aug 23 '18 at 06:52
try: `pickle.dump(Cv_vec.vocabulary_, open(dictionary_filepath, 'wb'))` — petezurich, Aug 23 '18 at 06:53

score 0 · Answer 1 · answered Aug 23 '18 at 10:30

Open the file in binary mode when pickling out an object. And try to use a context manager, i.e.

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500)
Cv_vec = cv.fit(X['review'])
X_cv=Cv_vec.transform(X['review']).toarray()
dictionary_filepath='CV_dict'

with open('CV_dict.pkl', 'wb') as fout:
    pickle.dump(Cv_vec.vocabulary_, fout)

Not able to write the Count Vectorizer vocabulary

1 Answers1