Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

votes

1 answer

List of common first names for text analysis in R?

In analysing text, it can be useful to identify names of people in text data. Objects prepackaged in tidytext include: English negators, modals, and adverbs (nma_words) Parts of Speech (parts_of_speech) Sentiments (sentiments), and Stop Words…

r nlp tidytext

asked Apr 26 '20 at 23:37

stevec

41,291
27
223
311

votes

1 answer

Tokenization in r tidytext, leaving in ampersands

I am currently using the unnest_tokens() function from the tidytext package. It works exactly as I need it to, however, it removes ampersands (&) from the text. I would like it to not do that, but leave everything else unchanged. For…

r tokenize tidytext unnest

asked Apr 21 '20 at 19:50

RayVelcoro

votes

1 answer

R tidytext Remove word if part of relevant bigrams, but keep if not

By using unnest_token, I want to create a tidy text tibble which combines two different tokens: single words and bigrams. The reasoning behind is that sometimes single words are the more reasonable unit to study and sometime it is rather…

r nlp tidytext

asked Mar 17 '20 at 11:09

user436994

votes

1 answer

Non-zero exit status tidyverse install packages Rstudio

I have been roaming the internet trying to find a solution, but haven't found it yet. My problem is: i can't install tidytext. I also found out I can't re-install tidyverse for some reason. The error code is: install.packages("tidytext") WARNING:…

r windows tidyverse install.packages tidytext

asked Feb 18 '20 at 19:55

maria118code

votes

1 answer

How can I download "Afinn" and "NRC" lexicon in R?

I'm trying to get_sentiments("afinn") and the "nrc" but I get this message: Error: The textdata package is required to download the NRC word-emotion association lexicon. Install the textdata package to access this dataset. How can I…

r text-mining tidytext

asked Jan 17 '20 at 08:57

Philip

votes

1 answer

Split text into ngrams without overlap in R

I have a dataframe where one column contains a lengthy transcript. I want to use unnest_tokens to split the transcript into ngrams of 50 words. The following code will split the transcripts: content <- data.frame(channel=c("NBC"), program=c("A"),…

r n-gram tidytext

asked Dec 11 '19 at 19:26

James Martherus

1,033
1
9
20

votes

2 answers

Preserve Hyphenated words in ngrams analysis with tidytext

I am doing text analysis of biograms. I want to preserve "complex" words made of many "simple" words linked by hyphens. for example, if I have the following vector: Example<- c("bovine retention-of-placenta sulpha-trimethoprim…

r regex text-mining tidytext

asked Oct 08 '19 at 06:23

JPV

votes

1 answer

Why is Quanteda not removing words?

I am having trouble removing profanities from my n-grams. The getProfanityWords function below correctly creates a character vector. The whole script works in every other way, but the profanities remain. I did wonder whether it was to do with the…

r nlp text-mining quanteda tidytext

asked Aug 30 '19 at 13:40

Chris

1,449
1
18
39

votes

2 answers

Error in check_input(x) : Input must be a character vector of any length or a list of character vectors, each of which has a length of 1

Using the tidytext package, I want to transform my tibble into a one-token-per-document-per-row. I transformed the text column of my tibble from factor to character but I still get the same error. text_df <- tibble(line = 1:3069, text = text) My…

r tidytext

asked Aug 12 '19 at 16:52

LG3555

votes

1 answer

How to include select 2-word phrases as tokens in tidytext?

I'm preprocessing some text data for further analysis. I tokenized the text using unnest_tokens() [into singular words] but want to keep certain commonly-occuring 2 word phrases such as "United States" or "social security." How can I do this using…

r tokenize tidytext

asked Aug 01 '19 at 07:28

Sonya C

votes

1 answer

Unable to use NRC lexicon in tidytext. Error in match.arg(lexicon) : 'arg' should be one of “afinn”, “bing”, “loughran”

I am learning sentiment analysis in R using tidytext package. However, i am unable to set nrc as lexicon. Whenever i type get_sentiments ("nrc"), the above error is displayed. It says that lexicon coud only be "afinn", "bing" or "loughran". I tried…

r text-mining tidytext

asked Jul 05 '19 at 12:15

AhmadAli

votes

2 answers

Installation directory?

I'm trying to install Tidytext package. It seems to me that R is installing the package into my OneDrive. I've been using R and I've not run into this problem before. I've unsynchronized One Drive and done a variety of things to change my working…

r tidytext

asked Apr 28 '19 at 05:13

user11386282

votes

2 answers

creating corpus from multiple txt files

I have multiple txt files, I want to have a tidy data. To do that first I create corpus ( I am not sure is it true way to do it). I wrote the following code to have the corpus data. folder<-"C:\\Users\\user\\Desktop\\text…

r tidytext

asked Feb 24 '19 at 08:52

FGH

votes

3 answers

R POS tagging and tokenizing in one go

I have a text as below. Section <- c("If an infusion reaction occurs, interrupt the infusion.") df <- data.frame(Section) When I tokenize using tidytext and the code below, AA <- df %>% mutate(tokens = str_extract_all(df$Section,…

r tokenize pos-tagger tidytext

asked Aug 15 '18 at 15:04

Krishna

votes

1 answer

How to do bi-grams topic modeling using tidy text in r?

So I tried using the tidytext package to do bigrams topic modeling, by following the steps on the tidytext website: https://www.tidytextmining.com/ngrams.html. I was able to get to the "word_counts" part, where R calculates each bi-gram's frequency.…

r text-mining n-gram topic-modeling tidytext

asked Jun 29 '18 at 17:22

user8157539

Prev 1 2 3

…

19 20 Next