I'm trying to build a news recommendation system for myself using Top2Vec topic modeling. Given the amazing news datasets, it isn't too difficult to actually train the model, but I'm unsure of how to categorize a novel article.
Top2Vec has the following capabilities:
Get number of detected topics.
Get topics.
Get topic sizes.
Get hierarchichal topics.
Search topics by keywords.
Search documents by topic.
Search documents by keywords.
Find similar words.
Find similar documents.
Expose model with RESTful-Top2Vec
I was thinking of taking the article and comparing it by term frequency with the pre-existing groups, but I'm unsure if there is a faster/easier way to do it. The best solution, in my eyes, is if I could simply embed the new article in the pre-existing model without having to retrain the whole thing.
I would appreciate any advice on how I should best relate a new article to the Top2Vec model.
Thank you!