1

I'm using text classification to identify dialects. I'm using sklearn and countVectorizer, I want to train naive bayes classifier on both character-based ngrams and vocabularies. So I have the following settings on two countVectorizers:

c=CountVectorizer(analyzer='char', ngram_range=(2,3))
c.fit_transform(X_train)

v=CountVectorizer(vocabulary=vocabs)
v.fit_transform(X_train)

Where do I go from there? I tried this:

numpy.concatenate([c,v])

as was suggested in this post: How to use bigrams + trigrams + word-marks vocabulary in countVectorizer?

but didn't work

John Sall
  • 1,027
  • 1
  • 12
  • 25

0 Answers0