Questions tagged [text-mining]

Text Mining is a process of deriving high-quality information from unstructured (textual) information.

Text Mining is a process of deriving high-quality information from unstructured (textual) information. Possible applications for text-mining are

  • Comments of Survey responses
  • Customer messages, emails, complaints etc.
  • Investigating competitors by crawling their web sites

More about text mining in below links.

2607 questions
0
votes
1 answer

Cosine Similarity Matrix in R

I have a document term matrix, "mydtm" that I have created in R, using the 'tm' package. I am attempting to depict the similarities between each of the 557 documents contained within the dtm/corpus. I have been attempting to use a cosine similarity…
0
votes
1 answer

How do I generate a word cloud for a large dataset in R?

I'm trying to generate a word cloud for a year's worth of complaint narrative data from the CFPB's public complaint database. There are roughly 100,000 words per year. I've been able to generate clouds using samples of about 1,000 words per year. I…
0ecd3e
  • 1
  • 1
  • 2
0
votes
3 answers

Inferring topics with mallet, using the saved topic state

I've used the following command to generate a topic model from some documents: bin/mallet train-topics --input topic-input.mallet --num-topics 100 --output-state topic-state.gz I have not, however, used the --output-model option to generate a…
sandesh247
  • 1,658
  • 1
  • 18
  • 24
0
votes
1 answer

Text mining between a data frame column and 2 lists in R

So i created two lists composed of words : fruits <- c("banana","apple","strawberry") homemade <- c("kitchen","homemade","mom","dad","sister") And here is my dataset description isCake apple cake cooked by mom YES pie from the…
katdataecon
  • 185
  • 8
0
votes
0 answers

Complex text mining in R with matching words from 2 lists

Well i created 2 list : expensive <- c("wine","watch","book","books","bottles","whisky") g1 <-c(df$gifts) (I have of course more than 6 words in my "expensive list" but it's just for the example.) My idea is to look at matching number to keep only…
katdataecon
  • 185
  • 8
0
votes
1 answer

can i be able to extract the structure of a pdf in R to check information such as author, date etc and store this in eg a dataframe?

i am extracting pdf from a web page and would like to see if it is possible to extract the xml structure of each of these pdfs, and to check for information such as the author, the title of each document, and store this information in a data…
ms_aka
  • 1
  • 2
0
votes
1 answer

Python frequency of words using gensim: How to get the word instead of id word in corpus

I use gensim to count the frequency of words in a given note. After applying the following code: from gensim import corpora dictionary = corpora.Dictionary(sentences) corpus = [dictionary.doc2bow(text) for text in sentences] Obtains a corpus such…
Agni412
  • 1
  • 2
0
votes
1 answer

tokenizing on a pdf for quantitative analysis

I ran into an issue using the unnest_tokens function on a data_frame. I am working with pdf files I want to compare. text_path <- "c:/.../text1.pdf" text_raw <- pdf_text("c:/.../text1.pdf") text1df<- data_frame(Zeile = 1:25, …
Maria
  • 3
  • 3
0
votes
1 answer

Counting specific word occurrences between 2 data frames in R with a group_by needed

I have two data frames in R, the first one (named Words) is composed by a single columns of words : Words Hello Building School Hospital Doctors The second is a big dataset presented like this…
katdataecon
  • 185
  • 8
0
votes
2 answers

R: Convert a "Term Document Matrix" to a "Corpus"

I am using the R programming language. I am trying to follow the instructions from this tutorial over here (https://cran.r-project.org/web/packages/tidytext/vignettes/tidying_casting.html) and learn how to convert a "term document matrix" into a…
stats_noob
  • 5,401
  • 4
  • 27
  • 83
0
votes
0 answers

R Error: Only works with Character Objects

I am using the R programming language. I am trying to replicate the previous stackoverflow post over here (R) About stopwords in DocumentTermMatrix , for the purpose of "tokenizing" and removing "stop words". Using some publicly available…
stats_noob
  • 5,401
  • 4
  • 27
  • 83
0
votes
1 answer

How do I solve : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

I am trying to analyze customer reviews. My data base is composed of one column named ReqSummary and when I am trying to start my sentiment analysis I receive the following error message: Error in check_input(x) : Input must be a character vector of…
Narin
  • 21
  • 1
  • 3
0
votes
1 answer

Dealing with several text columns in a labeled data set while running NLP in R

Hope all of you guys are healthy and well. I am new to the world of NLP and my question may sound stupid, so I apologize in advance.I would like to perform NLP on some text data which is labeled and run a text mining predictive model. I have four…
Alex
  • 245
  • 1
  • 7
0
votes
1 answer

Using Anti Join in R

I am a noob in R, and I been trying to compare two data frames which is derived using Text mining and it has two columns, one with words and other with count. Assume they are dataframe1 and dataframe2. I am trying to find out how to write the code…
Mr Pool
  • 218
  • 1
  • 8
0
votes
1 answer

Entities extraction based on customized list in R

I have list of texts and I also have a list of entities. The list of texts is typically in vectorized string. The list of entities is a bit more complexed. Some entities, can be listed out exhaustively such as the list of main cities of the…
Afiq Johari
  • 1,372
  • 1
  • 15
  • 28
1 2 3
99
100