Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
2
votes
1 answer

List of common first names for text analysis in R?

In analysing text, it can be useful to identify names of people in text data. Objects prepackaged in tidytext include: English negators, modals, and adverbs (nma_words) Parts of Speech (parts_of_speech) Sentiments (sentiments), and Stop Words…
stevec
  • 41,291
  • 27
  • 223
  • 311
2
votes
1 answer

Tokenization in r tidytext, leaving in ampersands

I am currently using the unnest_tokens() function from the tidytext package. It works exactly as I need it to, however, it removes ampersands (&) from the text. I would like it to not do that, but leave everything else unchanged. For…
RayVelcoro
  • 524
  • 6
  • 21
2
votes
1 answer

R tidytext Remove word if part of relevant bigrams, but keep if not

By using unnest_token, I want to create a tidy text tibble which combines two different tokens: single words and bigrams. The reasoning behind is that sometimes single words are the more reasonable unit to study and sometime it is rather…
user436994
  • 601
  • 5
  • 15
2
votes
1 answer

Non-zero exit status tidyverse install packages Rstudio

I have been roaming the internet trying to find a solution, but haven't found it yet. My problem is: i can't install tidytext. I also found out I can't re-install tidyverse for some reason. The error code is: install.packages("tidytext") WARNING:…
maria118code
  • 153
  • 1
  • 14
2
votes
1 answer

How can I download "Afinn" and "NRC" lexicon in R?

I'm trying to get_sentiments("afinn") and the "nrc" but I get this message: Error: The textdata package is required to download the NRC word-emotion association lexicon. Install the textdata package to access this dataset. How can I…
Philip
  • 21
  • 1
  • 3
2
votes
1 answer

Split text into ngrams without overlap in R

I have a dataframe where one column contains a lengthy transcript. I want to use unnest_tokens to split the transcript into ngrams of 50 words. The following code will split the transcripts: content <- data.frame(channel=c("NBC"), program=c("A"),…
James Martherus
  • 1,033
  • 1
  • 9
  • 20
2
votes
2 answers

Preserve Hyphenated words in ngrams analysis with tidytext

I am doing text analysis of biograms. I want to preserve "complex" words made of many "simple" words linked by hyphens. for example, if I have the following vector: Example<- c("bovine retention-of-placenta sulpha-trimethoprim…
JPV
  • 323
  • 2
  • 10
2
votes
1 answer

Why is Quanteda not removing words?

I am having trouble removing profanities from my n-grams. The getProfanityWords function below correctly creates a character vector. The whole script works in every other way, but the profanities remain. I did wonder whether it was to do with the…
Chris
  • 1,449
  • 1
  • 18
  • 39
2
votes
2 answers

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

Using the tidytext package, I want to transform my tibble into a one-token-per-document-per-row. I transformed the text column of my tibble from factor to character but I still get the same error. text_df <- tibble(line = 1:3069, text = text) My…
LG3555
  • 41
  • 1
  • 1
  • 3
2
votes
1 answer

How to include select 2-word phrases as tokens in tidytext?

I'm preprocessing some text data for further analysis. I tokenized the text using unnest_tokens() [into singular words] but want to keep certain commonly-occuring 2 word phrases such as "United States" or "social security." How can I do this using…
Sonya C
  • 33
  • 3
2
votes
1 answer

Unable to use NRC lexicon in tidytext. Error in match.arg(lexicon) : 'arg' should be one of “afinn”, “bing”, “loughran”

I am learning sentiment analysis in R using tidytext package. However, i am unable to set nrc as lexicon. Whenever i type get_sentiments ("nrc"), the above error is displayed. It says that lexicon coud only be "afinn", "bing" or "loughran". I tried…
AhmadAli
  • 21
  • 1
2
votes
2 answers

Installation directory?

I'm trying to install Tidytext package. It seems to me that R is installing the package into my OneDrive. I've been using R and I've not run into this problem before. I've unsynchronized One Drive and done a variety of things to change my working…
user11386282
  • 21
  • 1
  • 2
2
votes
2 answers

creating corpus from multiple txt files

I have multiple txt files, I want to have a tidy data. To do that first I create corpus ( I am not sure is it true way to do it). I wrote the following code to have the corpus data. folder<-"C:\\Users\\user\\Desktop\\text…
FGH
  • 91
  • 3
  • 8
2
votes
3 answers

R POS tagging and tokenizing in one go

I have a text as below. Section <- c("If an infusion reaction occurs, interrupt the infusion.") df <- data.frame(Section) When I tokenize using tidytext and the code below, AA <- df %>% mutate(tokens = str_extract_all(df$Section,…
Krishna
  • 61
  • 1
  • 5
2
votes
1 answer

How to do bi-grams topic modeling using tidy text in r?

So I tried using the tidytext package to do bigrams topic modeling, by following the steps on the tidytext website: https://www.tidytextmining.com/ngrams.html. I was able to get to the "word_counts" part, where R calculates each bi-gram's frequency.…