I am working on topic modeling and I am curious what exactly would be short text under this context?For example, if there is a research paper ,would the research paper's title and abstract be considered as short text?
1 Answers
I am working on topic modeling and I am curious what exactly would be short text under this context?
The recent survey paper on short text topic modeling (by Qiang et al.) mentions several datasets on which such models are evaluated: search snippets, StackOverflow question titles, tweets, and some others. The documents in these datasets have 5-14 words on average, and 14-37 words at maximum.
For example, if there is a research paper, would the research paper's title and abstract be considered as short text?
Paper abstracts that may have a bigger length. It is usual that the abstract has 200 or 300 words or even more.
The second argument that should be mentioned is that some short text topic modeling techniques assume that each text has exactly one topic (for example, in the paper by Yin & Wang). I think it's possible that the abstract may have several topics in it. So, some of the models that assume one topic per one document may perform badly on paper abstracts.

- 274
- 1
- 2
- 16