Answered in comment - How do I find the topic of news article using an already trained Top2Vec model?

Question

I'm trying to build a news recommendation system for myself using Top2Vec topic modeling. Given the amazing news datasets, it isn't too difficult to actually train the model, but I'm unsure of how to categorize a novel article.

Top2Vec has the following capabilities:

Get number of detected topics.
Get topics.
Get topic sizes.
Get hierarchichal topics.
Search topics by keywords.
Search documents by topic.
Search documents by keywords.
Find similar words.
Find similar documents.
Expose model with RESTful-Top2Vec

I was thinking of taking the article and comparing it by term frequency with the pre-existing groups, but I'm unsure if there is a faster/easier way to do it. The best solution, in my eyes, is if I could simply embed the new article in the pre-existing model without having to retrain the whole thing.

I would appreciate any advice on how I should best relate a new article to the Top2Vec model.

Thank you!

I got it; you can vectorize new articles. Say you name your trained model "model". By using model.model.infer_vector([text]), you can get a vector of the desired "text". Note the infer_vector takes a list as an argument, so you can't just input the raw text. Furthermore, the text must be a string rather than an Object. — AjS, Mar 27 '23 at 18:14

Answered in comment - How do I find the topic of news article using an already trained Top2Vec model?

0 Answers0