I am trying to get this snippet of code working.
vectorizer = CountVectorizer(analyzer='word',
min_df=3, # minimum reqd occurences of a word
stop_words='english', # remove stop words
lowercase=True, # convert all words to lowercase
token_pattern='[a-zA-Z0-9]{3,}', # num chars > 3
max_features=3000, # max number of uniq words
)
data_vectorized = vectorizer.fit_transform(df['sentence'])
lda_model = LatentDirichletAllocation(n_components=40, # Number of topics
learning_method='online',
random_state=0,
n_jobs = -1 # Use all available CPUs
)
lda_output = lda_model.fit_transform(data_vectorized)
pyLDAvis.enable_notebook()
pyLDAvis.sklearn.prepare(lda_model, data_vectorized, vectorizer, mds='tsne')
I found the concept from the link below.
https://towardsdatascience.com/when-topic-modeling-is-part-of-the-text-pre-processing-294b58d35514
There are several other examples of pyLDAvis.sklearn.prepare
online. After trying several examples, I always get this error.
AttributeError Traceback (most recent call last)
Cell In[6], line 24
20 lda_output = lda_model.fit_transform(data_vectorized)
23 pyLDAvis.enable_notebook()
---> 24 pyLDAvis.sklearn.prepare(lda_model, data_vectorized, vectorizer, mds='tsne')
File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:94, in prepare(lda_model, dtm, vectorizer, **kwargs)
62 def prepare(lda_model, dtm, vectorizer, **kwargs):
63 """Create Prepared Data from sklearn's LatentDirichletAllocation and CountVectorizer.
64
65 Parameters
(...)
92 See `pyLDAvis.prepare` for **kwargs.
93 """
---> 94 opts = fp.merge(_extract_data(lda_model, dtm, vectorizer), kwargs)
95 return pyLDAvis.prepare(**opts)
File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:38, in _extract_data(lda_model, dtm, vectorizer)
37 def _extract_data(lda_model, dtm, vectorizer):
---> 38 vocab = _get_vocab(vectorizer)
39 doc_lengths = _get_doc_lengths(dtm)
40 term_freqs = _get_term_freqs(dtm)
File ~\anaconda3\lib\site-packages\pyLDAvis\sklearn.py:20, in _get_vocab(vectorizer)
19 def _get_vocab(vectorizer):
---> 20 return vectorizer.get_feature_names()
AttributeError: 'CountVectorizer' object has no attribute 'get_feature_names'
I just tried this: pip install scikit-learn==0.22.2.post1
That gives me this:
× Encountered error while trying to install package.
╰─> scikit-learn