CountVectorizer Error using pyLDAvis. Any thoughts on how to resolve?

Question

I am trying to get this snippet of code working.

vectorizer = CountVectorizer(analyzer='word',       
                             min_df=3,                        # minimum reqd occurences of a word 
                             stop_words='english',             # remove stop words
                             lowercase=True,                   # convert all words to lowercase
                             token_pattern='[a-zA-Z0-9]{3,}',  # num chars > 3
                             max_features=3000,             # max number of uniq words
                            )

data_vectorized = vectorizer.fit_transform(df['sentence'])

lda_model = LatentDirichletAllocation(n_components=40, # Number of topics
                                      learning_method='online',
                                      random_state=0,       
                                      n_jobs = -1  # Use all available CPUs
                                     )
lda_output = lda_model.fit_transform(data_vectorized)

pyLDAvis.enable_notebook()
pyLDAvis.sklearn.prepare(lda_model, data_vectorized, vectorizer, mds='tsne')

I found the concept from the link below.

https://towardsdatascience.com/when-topic-modeling-is-part-of-the-text-pre-processing-294b58d35514

There are several other examples of pyLDAvis.sklearn.prepare online. After trying several examples, I always get this error.

AttributeError                            Traceback (most recent call last)
Cell In[6], line 24
     20 lda_output = lda_model.fit_transform(data_vectorized)
     23 pyLDAvis.enable_notebook()
---> 24 pyLDAvis.sklearn.prepare(lda_model, data_vectorized, vectorizer, mds='tsne')

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:94, in prepare(lda_model, dtm, vectorizer, **kwargs)
     62 def prepare(lda_model, dtm, vectorizer, **kwargs):
     63     """Create Prepared Data from sklearn's LatentDirichletAllocation and CountVectorizer.
     64 
     65     Parameters
   (...)
     92     See `pyLDAvis.prepare` for **kwargs.
     93     """
---> 94     opts = fp.merge(_extract_data(lda_model, dtm, vectorizer), kwargs)
     95     return pyLDAvis.prepare(**opts)

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:38, in _extract_data(lda_model, dtm, vectorizer)
     37 def _extract_data(lda_model, dtm, vectorizer):
---> 38     vocab = _get_vocab(vectorizer)
     39     doc_lengths = _get_doc_lengths(dtm)
     40     term_freqs = _get_term_freqs(dtm)

File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:20, in _get_vocab(vectorizer)
     19 def _get_vocab(vectorizer):
---> 20     return vectorizer.get_feature_names()

AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'

I just tried this: pip install scikit-learn==0.22.2.post1

That gives me this:

× Encountered error while trying to install package.
╰─> scikit-learn

Hi I've gotten the same issue, did you resolve yours ? – userrr Jul 10 '23 at 13:18 — userrr, Jul 10 '23 at 13:18

CountVectorizer Error using pyLDAvis. Any thoughts on how to resolve?

0 Answers0