Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

  • Latent Dirichlet Allocation (LDA)
  • Hierarchical Dirichlet process (HDP)

Software / Libraries

Related Tags :

980 questions
-1
votes
1 answer

Would it be possible to add some code to this topic model script to enable non r trained colleagues to use it easily?

I have a topic model script which I would like to make available as a shiny app I have not got any experience using shiny and this will be my first effort. The input will always be an excel sheet with the same format, and I want the output to be an…
jam.gunn
  • 21
  • 2
-1
votes
1 answer

how can i get the topic coherence score of two models and then use it for comparison?

I want to get the topic coherence for the LDA model. Let's say I have two LDA model one with a bag of words and the second one with a bag of phrases. how I can get the coherence for these two models and then compare them on the basis of coherence?
user3778289
  • 323
  • 4
  • 18
-1
votes
1 answer

How to extract categories out of short text documents?

My data contains the answers to the open-ended question: what are the reasons for recommending the organization you work for? I want to use an algorithm / technique that, using this data, learns the categories (i.e. the reasons) that occur most…
Marjolijn
  • 11
  • 1
-1
votes
1 answer

LDA: Assign more than one topic to a document

I´m new to LDA and doing some experiments with Python + LDA and some sample datasets. I already got some very interesting results and now I asked myself a question but couldn´t find an answer so far. Since I worked with customer reviews/ratings of a…
Nicson
  • 15
  • 4
-1
votes
1 answer

Text Cleaning Issues

I'm learning text cleaning using python online. I have get rid of some stop words and lower the letter. but when i execute this code, it doesn't show anything. I don't know why. # we add some words to the stop word list texts, article = [], [] for w…
WillZhang
  • 23
  • 5
-1
votes
3 answers

How to search for a specific paragraph in a text?

I am looking for ways to extract specific paragraphs out of strings. I have a lot of documents which I want to use for topic modeling, but these contain tables, figures, headers, etc. I only want to use the summary which usually is in a document.…
nlp_noob
  • 11
  • 4
-1
votes
1 answer

'float' object has no attribute 'encode'

this code is for indexing and sorting on each cluster to identify which are the top n (I chose n=6) words that are nearest to the cluster centroid. Anyway, i found this kind of error: 'float' object has no attribute 'encode' can anyone help…
Edoardo
  • 11
  • 1
-1
votes
2 answers

how to Convert list of tuple into column from text file

I have a text file which contains a list of tuples. I want to convert this list into columns. The file contains the following data: [(0, u'0.025*"minimalism" + 0.018*"diwali" + 0.018*"sunday" + 0.018*"minimalistics" + 0.018*"plant" +…
aneeket
  • 126
  • 3
  • 12
-1
votes
1 answer

Python - IndexError: list index out of range (topic modeling)

I've come across a lot of similar questions. However, the answers provided seemed not to be helpful to me. I'm trying to run a Topic Modeling analysis on an 8000'ish media articles. But I'm getting this error: Traceback (most recent call last): …
-1
votes
1 answer

LDA Topic assignment

I have a corpora of ~1,400 documents. I did all text cleansing using tm package. My last step was creation of the DTM matrix. I am trying to train the LDA model based on 200 documents examined by human and topics(categories) that were assigned.…
Sir Oliver
  • 57
  • 8
-1
votes
1 answer

How can I perform LDA (latent Dirichlet allocation) on Noun Phrases in R instead of words?

I want to generate topics from my text at the level of phrases, rather than at the level of words using LDA (latent Dirichlet allocation). How can I do that in R? LDA interprets the documents as bag-of-words and produces topics with constituting…
carora3
  • 466
  • 1
  • 5
  • 19
-1
votes
1 answer

How to plot log.likelihoods for each iteration in R using LDA package?

My problem is that I want to plot the log.likelihoods gathered from LDA execution in R using the LDA package. My code is: K <- 10 ## Num clusters result <- lda.collapsed.gibbs.sampler(cora.documents, K, ## Num…
-1
votes
1 answer

What is the approach to generate Topics from text using a wikipedia dump

I'm new to NLP/text processing and building an application which requires generating topics (Music, Games, Romance, History etc etc.) from about 2 lines of imput text. I've decided to use wikipedia's articlebase to help me out in this process, What…
wolfgang
  • 7,281
  • 12
  • 44
  • 72
-2
votes
1 answer

How to remove error too many values to unpack (expected 2)

Applied LDA model usinf TFIDF and then I want Performance evaluation by classifying sample document using LDA TF-IDF model. Code: for index, score in sorted(lda_model_tfidf[corpus], key=lambda tup: -1*tup[1]): print("\nScore: {}\t \nTopic:…
-2
votes
2 answers

Replace a list of words with one unique word in R

I am working on a text analysis with R and have a dataset (text corpus) with various sentences about different fruits. For example: "apple", "banana" , "orange", "pear", etc. Since it is not relevant for the analysis whether someone writes about…
lole_emily
  • 95
  • 9
1 2 3
65
66