Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
0
votes
1 answer

Loading Loughran finance sentiment into Tidytext

I'm using the sentiment tools in Tidytext for the first time, and would like to use the Loughran dictionary. After several attempts, the closest I get is this error: get_sentiments("loughran") Error in get_sentiments("loughran") : could not…
LCC
  • 3
  • 3
0
votes
1 answer

R code suddenly stopped working in tidy text

I am trying to do word analysis on some data in R. I imported one column of data that was text responses from a survey into R using read.csv. I named one of the columns "text" . This code was working fine a few days ago and now it suddenly is…
0
votes
0 answers

R's tm_map is creating non-existing words

I'm using tm package to find associations between words in a text. This is what I did (I'm also using tidytext package) book <- Corpus(VectorSource(c(part1,part2,part3,part4,part5))) book <- tm_map(book, content_transformer(tolower)) book <-…
pachadotdev
  • 3,345
  • 6
  • 33
  • 60
-1
votes
1 answer

Extract proper nouns from text in R?

Is there any better way of extracting proper nouns (e.g. "London", "John Smith", "Gulf of Carpentaria") from free text? That is, a function like proper_nouns <- function(text_input) { # ... } such that it would extract a list of proper nouns from…
stevec
  • 41,291
  • 27
  • 223
  • 311
-1
votes
1 answer

Warning Message: pairwise_count Function

I'm attempting to follow this tutorial on using the pairwise_count function in the widyr package. In particular, consider this line of code, where data is a tibble which includes the columns "word" and "section": data %>% pairwise_count(word,…
dext
  • 1
  • 3
-1
votes
1 answer

How can i convert character object (web-page parsed) to tidy object in R?

Using library(htm2txt) url <- 'https://en.wikipedia.org/wiki/Alan_Turing' clear.text <- gettxt(url) code i'm getting clear.text [1] "Alan Turing\n\nFrom Wikipedia, the free encyclopedia\n\nJump to navigation\tJump to search\n\n\"Turing\" redirects…
kwadratens
  • 187
  • 15
-2
votes
1 answer

R - delete length-one strings and stopwords (using tidytext) in character

If I have a df: Class sentence 1 Yes there is p beaker on the table 2 Yes they t the frown 3 Yes so Z it was asleep How do I remove the length-one strings within "sentence" column to remove things like "t" "p" and "Z", and then do a…
-2
votes
1 answer

Why are these stop words not being removed from my data?

Tokenization of the data tidy_text <- data %>% unnest_tokens(word, q_content) Removal of stop words data("stop_words") stop_words tidy_text <- tidy_text %>% anti_join(stop_words, by ="word") tidy_text %>% count(word, sort = TRUE) Output…
-2
votes
1 answer

R Loop over IDs

I'd like to run pairwise_count in a loop and my input looks like the table in the image. Each ID stands for a text and the rows contains the sentences of the text. My idea of a for loop doesn't work. Has someone maybe an idea, how that loop could…
1 2 3
19
20