8

I've got a database of hundreds of thousands of forum posts, and would like to tag them in an unsupervised way.

I noticed that StackOverflow's tag system suggests tags as I go. How does this algorithm work?

I also found this that implies it is SVM based- is it official? http://dl.acm.org/citation.cfm?id=2660970&dl=ACM&coll=DL&CFID=522960920&CFTOKEN=15091676

cjm2671
  • 18,348
  • 31
  • 102
  • 161
  • I don't know exact answer, but I almost sure that for tagging they use the recommender engine. That's one of the tasks (tagging) for which recommender engines are actually designed. I beleive they use the recommender with item-based approach. You could read more in Mahout in action (I personally think this is one of the best books on this matter) – Maksim Khaitovich Jun 25 '15 at 16:44
  • 3
    I am surprised no one has answered this question. This is definitely a thing worth to know. – Alexander Popov Jul 28 '15 at 21:09
  • Agreed. Did you ever get more info on the subject? – Mike Purcell Nov 16 '16 at 19:09

1 Answers1

0

You could also follow a shallow (authors call it deep though) inverse regression using Gensim and word embeddings for document classification. Ideally, using both the titles and text of the forum posts, you should be able to build a pretty decent classification system. Follow along here in this notebook and paper.