Questions tagged [text-mining]

Text Mining is a process of deriving high-quality information from unstructured (textual) information.

Text Mining is a process of deriving high-quality information from unstructured (textual) information. Possible applications for text-mining are

Comments of Survey responses
Customer messages, emails, complaints etc.
Investigating competitors by crawling their web sites

Cosine Similarity Matrix in R

I have a document term matrix, "mydtm" that I have created in R, using the 'tm' package. I am attempting to depict the similarities between each of the 557 documents contained within the dtm/corpus. I have been attempting to use a cosine similarity…

r text-mining tm

asked Jun 02 '21 at 19:47

Luke Hansen

votes

1 answer

How do I generate a word cloud for a large dataset in R?

I'm trying to generate a word cloud for a year's worth of complaint narrative data from the CFPB's public complaint database. There are roughly 100,000 words per year. I've been able to generate clouds using samples of about 1,000 words per year. I…

r text-mining large-data word-cloud

asked Jun 02 '21 at 06:13

0ecd3e

votes

3 answers

Inferring topics with mallet, using the saved topic state

I've used the following command to generate a topic model from some documents: bin/mallet train-topics --input topic-input.mallet --num-topics 100 --output-state topic-state.gz I have not, however, used the --output-model option to generate a…

text-mining topic-modeling mallet

asked Jul 19 '11 at 19:27

sandesh247

1,658
1
18
24

votes

1 answer

Text mining between a data frame column and 2 lists in R

So i created two lists composed of words : fruits <- c("banana","apple","strawberry") homemade <- c("kitchen","homemade","mom","dad","sister") And here is my dataset description isCake apple cake cooked by mom YES pie from the…

r list dataframe text-mining rdata

asked May 13 '21 at 11:44

katdataecon

votes

0 answers

Complex text mining in R with matching words from 2 lists

Well i created 2 list : expensive <- c("wine","watch","book","books","bottles","whisky") g1 <-c(df$gifts) (I have of course more than 6 words in my "expensive list" but it's just for the example.) My idea is to look at matching number to keep only…

r list dataframe text-mining matching

asked May 12 '21 at 14:18

katdataecon

votes

1 answer

can i be able to extract the structure of a pdf in R to check information such as author, date etc and store this in eg a dataframe?

i am extracting pdf from a web page and would like to see if it is possible to extract the xml structure of each of these pdfs, and to check for information such as the author, the title of each document, and store this information in a data…

r xml xml-parsing text-mining pdftotext

asked May 12 '21 at 11:44

ms_aka

votes

1 answer

Python frequency of words using gensim: How to get the word instead of id word in corpus

I use gensim to count the frequency of words in a given note. After applying the following code: from gensim import corpora dictionary = corpora.Dictionary(sentences) corpus = [dictionary.doc2bow(text) for text in sentences] Obtains a corpus such…

python text-mining gensim

asked May 06 '21 at 18:35

Agni412

votes

1 answer

tokenizing on a pdf for quantitative analysis

I ran into an issue using the unnest_tokens function on a data_frame. I am working with pdf files I want to compare. text_path <- "c:/.../text1.pdf" text_raw <- pdf_text("c:/.../text1.pdf") text1df<- data_frame(Zeile = 1:25, …

r nlp text-mining quanteda

asked May 05 '21 at 18:30

Maria

votes

1 answer

Counting specific word occurrences between 2 data frames in R with a group_by needed

I have two data frames in R, the first one (named Words) is composed by a single columns of words : Words Hello Building School Hospital Doctors The second is a big dataset presented like this…

r dataframe text-mining matching

asked May 05 '21 at 12:46

katdataecon

votes

2 answers

R: Convert a "Term Document Matrix" to a "Corpus"

I am using the R programming language. I am trying to follow the instructions from this tutorial over here (https://cran.r-project.org/web/packages/tidytext/vignettes/tidying_casting.html) and learn how to convert a "term document matrix" into a…

r text nlp text-mining

asked May 05 '21 at 03:16

stats_noob

5,401
4
27
83

votes

0 answers

R Error: Only works with Character Objects

I am using the R programming language. I am trying to replicate the previous stackoverflow post over here (R) About stopwords in DocumentTermMatrix , for the purpose of "tokenizing" and removing "stop words". Using some publicly available…

r text nlp character text-mining

asked May 03 '21 at 21:29

stats_noob

5,401
4
27
83

votes

1 answer

How do I solve : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

I am trying to analyze customer reviews. My data base is composed of one column named ReqSummary and when I am trying to start my sentiment analysis I receive the following error message: Error in check_input(x) : Input must be a character vector of…

r text-mining sentiment-analysis

asked May 03 '21 at 11:49

Narin

votes

1 answer

Dealing with several text columns in a labeled data set while running NLP in R

Hope all of you guys are healthy and well. I am new to the world of NLP and my question may sound stupid, so I apologize in advance.I would like to perform NLP on some text data which is labeled and run a text mining predictive model. I have four…

r nlp text-mining data-cleaning tm

asked Apr 29 '21 at 21:11

Alex

votes

1 answer

Using Anti Join in R

I am a noob in R, and I been trying to compare two data frames which is derived using Text mining and it has two columns, one with words and other with count. Assume they are dataframe1 and dataframe2. I am trying to find out how to write the code…

text-mining

asked Apr 23 '21 at 00:50

Mr Pool

votes

1 answer

Entities extraction based on customized list in R

I have list of texts and I also have a list of entities. The list of texts is typically in vectorized string. The list of entities is a bit more complexed. Some entities, can be listed out exhaustively such as the list of main cities of the…

r nlp text-mining r-package named-entity-recognition

asked Apr 22 '21 at 07:55

Afiq Johari

1,372
1
15
28

Prev 1 2 3

…

100 Next