0

I'm trying to build a news recommendation system for myself using Top2Vec topic modeling. Given the amazing news datasets, it isn't too difficult to actually train the model, but I'm unsure of how to categorize a novel article.

Top2Vec has the following capabilities:

  • Get number of detected topics.

  • Get topics.

  • Get topic sizes.

  • Get hierarchichal topics.

  • Search topics by keywords.

  • Search documents by topic.

  • Search documents by keywords.

  • Find similar words.

  • Find similar documents.

  • Expose model with RESTful-Top2Vec

I was thinking of taking the article and comparing it by term frequency with the pre-existing groups, but I'm unsure if there is a faster/easier way to do it. The best solution, in my eyes, is if I could simply embed the new article in the pre-existing model without having to retrain the whole thing.

I would appreciate any advice on how I should best relate a new article to the Top2Vec model.

Thank you!

AjS
  • 13
  • 4
  • I got it; you can vectorize new articles. Say you name your trained model "model". By using model.model.infer_vector([text]), you can get a vector of the desired "text". Note the infer_vector takes a list as an argument, so you can't just input the raw text. Furthermore, the text must be a string rather than an Object. – AjS Mar 27 '23 at 18:14

0 Answers0