Questions tagged [topic-modeling]

Topic models describe the frequency of topics in documents and text. A "topic" is a group of words which tend to occur together.

A topic model is a type of statistical model for discovering the abstract "topics" that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: "dog" and "bone" will appear more often in documents about dogs, "cat" and "meow" will appear in documents about cats (source: wikipedia)

Generative models (i.e. the statistical models used for topic modelling)

Latent Dirichlet Allocation (LDA)
Hierarchical Dirichlet process (HDP)

Software / Libraries

Mallet (Java)
Stanford Topic Modeling Toolbox (software)
Gensim – Topic Modelling for Humans

Related Tags :

topicmodels

980 questions

-1

votes

1 answer

Would it be possible to add some code to this topic model script to enable non r trained colleagues to use it easily?

I have a topic model script which I would like to make available as a shiny app I have not got any experience using shiny and this will be my first effort. The input will always be an excel sheet with the same format, and I want the output to be an…

r shiny topic-modeling

asked Sep 04 '19 at 13:06

jam.gunn

-1

votes

1 answer

how can i get the topic coherence score of two models and then use it for comparison?

I want to get the topic coherence for the LDA model. Let's say I have two LDA model one with a bag of words and the second one with a bag of phrases. how I can get the coherence for these two models and then compare them on the basis of coherence?

lda topic-modeling

asked Apr 15 '19 at 19:37

user3778289

-1

votes

1 answer

How to extract categories out of short text documents?

My data contains the answers to the open-ended question: what are the reasons for recommending the organization you work for? I want to use an algorithm / technique that, using this data, learns the categories (i.e. the reasons) that occur most…

nlp cluster-analysis text-mining topic-modeling

asked Apr 02 '19 at 08:49

Marjolijn

-1

votes

1 answer

LDA: Assign more than one topic to a document

I´m new to LDA and doing some experiments with Python + LDA and some sample datasets. I already got some very interesting results and now I asked myself a question but couldn´t find an answer so far. Since I worked with customer reviews/ratings of a…

python nlp data-science lda topic-modeling

asked Dec 11 '18 at 17:56

Nicson

-1

votes

1 answer

Text Cleaning Issues

I'm learning text cleaning using python online. I have get rid of some stop words and lower the letter. but when i execute this code, it doesn't show anything. I don't know why. # we add some words to the stop word list texts, article = [], [] for w…

python text nlp topic-modeling

asked Nov 21 '18 at 08:20

WillZhang

-1

votes

3 answers

How to search for a specific paragraph in a text?

I am looking for ways to extract specific paragraphs out of strings. I have a lot of documents which I want to use for topic modeling, but these contain tables, figures, headers, etc. I only want to use the summary which usually is in a document.…

python string nlp topic-modeling

asked Jun 27 '18 at 12:18

nlp_noob

-1

votes

1 answer

'float' object has no attribute 'encode'

this code is for indexing and sorting on each cluster to identify which are the top n (I chose n=6) words that are nearest to the cluster centroid. Anyway, i found this kind of error: 'float' object has no attribute 'encode' can anyone help…

python python-3.x cluster-analysis topic-modeling

asked Dec 19 '17 at 15:30

Edoardo

-1

votes

2 answers

how to Convert list of tuple into column from text file

I have a text file which contains a list of tuples. I want to convert this list into columns. The file contains the following data: [(0, u'0.025*"minimalism" + 0.018*"diwali" + 0.018*"sunday" + 0.018*"minimalistics" + 0.018*"plant" +…

python python-2.7 csv nlp topic-modeling

asked Dec 13 '17 at 17:09

aneeket

-1

votes

1 answer

Python - IndexError: list index out of range (topic modeling)

I've come across a lot of similar questions. However, the answers provided seemed not to be helpful to me. I'm trying to run a Topic Modeling analysis on an 8000'ish media articles. But I'm getting this error: Traceback (most recent call last): …

python lda topic-modeling

asked Jul 12 '17 at 08:45

M. M. Van Hulle

-1

votes

1 answer

LDA Topic assignment

I have a corpora of ~1,400 documents. I did all text cleansing using tm package. My last step was creation of the DTM matrix. I am trying to train the LDA model based on 200 documents examined by human and topics(categories) that were assigned.…

r topic-modeling

asked Oct 12 '16 at 20:10

Sir Oliver

-1

votes

1 answer

How can I perform LDA (latent Dirichlet allocation) on Noun Phrases in R instead of words?

I want to generate topics from my text at the level of phrases, rather than at the level of words using LDA (latent Dirichlet allocation). How can I do that in R? LDA interprets the documents as bag-of-words and produces topics with constituting…

r lda topic-modeling

asked Jun 22 '15 at 14:20

carora3

-1

votes

1 answer

How to plot log.likelihoods for each iteration in R using LDA package?

My problem is that I want to plot the log.likelihoods gathered from LDA execution in R using the LDA package. My code is: K <- 10 ## Num clusters result <- lda.collapsed.gibbs.sampler(cora.documents, K, ## Num…

r lda topic-modeling

asked May 20 '15 at 18:53

Sidahmed Mokeddem

-1

votes

1 answer

What is the approach to generate Topics from text using a wikipedia dump

I'm new to NLP/text processing and building an application which requires generating topics (Music, Games, Romance, History etc etc.) from about 2 lines of imput text. I've decided to use wikipedia's articlebase to help me out in this process, What…

python nlp text-processing topic-modeling

asked Apr 10 '15 at 03:52

wolfgang

7,281
12
44
72

-2

votes

1 answer

How to remove error too many values to unpack (expected 2)

Applied LDA model usinf TFIDF and then I want Performance evaluation by classifying sample document using LDA TF-IDF model. Code: for index, score in sorted(lda_model_tfidf[corpus], key=lambda tup: -1*tup[1]): print("\nScore: {}\t \nTopic:…

python lda topic-modeling

asked Jul 18 '21 at 08:35

Rajat Goyal

-2

votes

2 answers

Replace a list of words with one unique word in R

I am working on a text analysis with R and have a dataset (text corpus) with various sentences about different fruits. For example: "apple", "banana" , "orange", "pear", etc. Since it is not relevant for the analysis whether someone writes about…

r regex nlp data-cleaning topic-modeling

asked Jun 04 '20 at 20:24

lole_emily

Prev 1 2 3

…

66 Next