I'm using text classification to identify dialects. I'm using sklearn and countVectorizer, I want to train naive bayes classifier on both character-based ngrams and vocabularies. So I have the following settings on two countVectorizers:
c=CountVectorizer(analyzer='char', ngram_range=(2,3))
c.fit_transform(X_train)
v=CountVectorizer(vocabulary=vocabs)
v.fit_transform(X_train)
Where do I go from there? I tried this:
numpy.concatenate([c,v])
as was suggested in this post: How to use bigrams + trigrams + word-marks vocabulary in countVectorizer?
but didn't work