I have used BertTopic
with KeyBERT
to extract some topics
from some docs
from bertopic import BERTopic
topic_model = BERTopic(nr_topics="auto", verbose=True, n_gram_range=(1, 4), calculate_probabilities=True, embedding_model='paraphrase-MiniLM-L3-v2', min_topic_size= 3)
topics, probs = topic_model.fit_transform(docs)
Now I can access the topic name
freq = topic_model.get_topic_info()
print("Number of topics: {}".format( len(freq)))
freq.head(30)
Topic Count Name
0 -1 1 -1_default_greenbone_gmp_manager
1 0 14 0_http_tls_ssl tls_ssl
2 1 8 1_jboss_console_web_application
and inspect the topics
[('http', 0.0855701486234524),
('tls', 0.061977919455444744),
('ssl tls', 0.061977919455444744),
('ssl', 0.061977919455444744),
('tcp', 0.04551718585531556),
('number', 0.04551718585531556)]
[('jboss', 0.14014705432060262),
('console', 0.09285308122803233),
('web', 0.07323749337563096),
('application', 0.0622930523123512),
('management', 0.0622930523123512),
('apache', 0.05032395169459188)]
What I want is to have a final dataframe
that has in one column
the topic name
and in another column
the elements of the topic
expected outcome:
class entities
o http_tls_ssl tls_ssl HTTP...etc
1 jboss_console_web_application JBoss, console, etc
and one dataframe with the topic name on different columns
http_tls_ssl tls_ssl jboss_console_web_application
o http JBoss
1 tls console
2 etc etc
I did not find out how to do this. Is there a way?