Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

votes

2 answers

Tidytext - set expressions as a single token

I am trying to separate my text data into tokens using the unnest_tokens function from the tidytext package. The thing is that some expressions appear multiple times and I would like to keep them a single token instead of multiple tokens. Normal…

r tidytext

asked Dec 04 '21 at 13:59

Daniel

votes

1 answer

Tidytext error '~/Library/Caches/textdata/nrc/NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt' does not exist

I tried to use tidytext to do sentiment analysis library(tidytext) get_sentiments("nrc") but it gives me an error: Error: '~/Library/Caches/textdata/nrc/NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt' does…

r tidytext

asked Nov 15 '21 at 21:54

Kexin Ni

votes

1 answer

grouped filter process really slow

So I have this massive tibble with tokens that I'm trying to do some filtering on and then transform into a document term matrix. My problem is that the grouped filtering process runs really slow. Does anyone have a good suggestion on how I can…

r text dplyr tidytext

asked Nov 05 '21 at 22:11

MariusJ

votes

1 answer

tidytext problem using dplyr: not count words

I am getting problem with an old script using tidytext and dplyr libraries. My example was extracted from : https://community.rstudio.com/t/problem-with-unnest-tokens-function/94107 But I am having the same problem: library(gutenbergr) …

r dplyr tidytext

asked Nov 01 '21 at 21:57

Rodrigo_BC

votes

2 answers

Tidying characters in R to the least specific detail based on similarity

I have a dataframe of drug IDs (NDC_NBR) and their corresponding drug names (BRAND_NM). I need to collapse/aggregate the drug names to the least specificity as possible per drug. Here is an example of the data I am working with and the expected…

r string tidytext

asked Oct 21 '21 at 12:10

TheGoat

2,587
3
25
58

votes

3 answers

How to search for words with asterisks and wildcards (e.g., exampl*) in R (word appearance in a data frame)

I wrote a code to count the appearance of words in a data frame: Items <- c('decid*','head', 'heads') df1<-data.frame(Items) words<- c('head', 'heads', 'decided', 'decides', 'top', 'undecided') df_main<-data.frame(words) item <- vector() count <-…

r nlp stringr tidytext

asked Oct 07 '21 at 10:31

Asghar

votes

1 answer

How do I keep certain special characters when making ngrams using tidytext::unnest_tokens()?

I'm working on text that has character combinations like "3/8" and "5/8" when referring to particular sizes of things and I'm making bigrams to help analyze the text. I'd like to not have the "/" character removed but am not finding a way to do…

r tidyverse tidytext

asked Oct 01 '21 at 15:52

Nickerbocker

votes

2 answers

What would be the best approach for grouping data according to a table of keywords in R

I have the following dictionary for grouping data 1. [aa11, aa21, aa31, aa34], "group A" 2. [x23z, x22z, x32z, x35z, x34z],"group B" 3. [lg32z, lg22z, lg84x, lg94y], "group C" 4. ... The column in the data itself may also have more than…

r tidyverse stringr tidytext

asked Sep 22 '21 at 16:01

Jacek Kotowski

votes

2 answers

How to calculate avg response time & total response time based on group_by cols and timestamps using R?

r datetime dplyr tidytext

asked Sep 15 '21 at 19:55

Dinho

votes

1 answer

How to create a facet_wrap plot that shows top 10 common words found based on group in R?

Reference code and image below: I have a dataframe that is grouped by company name that looks like so: Company | tweet AMZN @115827 Thanks for your patience. AMZN @115826 I'm sorry for the wait. You'll receive an email as soon as…

r ggplot2 facet-wrap tidytext

asked Sep 15 '21 at 00:39

Dinho

votes

0 answers

Recoding sentence tokens using tidy text mining in R

I'm trying to analyse qualitative responses to a survey using tidy text mining in R. I have tokenised my data via sentences. In some cases, I have found that in one sentence, participants have reported multiple behaviours that I want to analyse…

r nlp text-mining tidytext

asked Sep 06 '21 at 14:46

rpsychstats

votes

0 answers

unnest_tokens and ERROR in UseMethod("pull")

I'm having problems using unnest_tokens on a particular dataset. I've tried using non-standard evaluation, i.e. unnest_tokens_, to pull the column from the dataframe. However this feature is now deprecated. I've subset the singular column I want to…

r tidytext

asked Aug 23 '21 at 15:04

Magnetar

votes

0 answers

R tidytext graph comes up vertical and not horizontal

I'm trying to create a sentiment analysis using the tidytext code here but my graph comes out vertical, without the output making sense compared to the original which is horizontal. How can I fix this? #Unnest tokens edAItext = edAI %>%…

r ggplot2 tidytext

asked Jul 10 '21 at 19:03

Delly

votes

1 answer

How do I plot horizontal "histogram"-bars starting at zero

I'm performing topic-modelling applying "Text Mining with R: A tidy approach" by Silge and Robinson. It is not shown how to plot figure 3.6, showing the "greatest difference in β between topic 2 and topic 1". I searched the internet including ways…

r ggplot2 text nlp tidytext

asked Jun 20 '21 at 14:01

Anders Jørgensen

votes

0 answers

How do I apply ggplot during sentiment analysis, as in Silge et al.'s Jane Austin example

I suspect this to be a fairly straightforward question for coders more experienced than myself. I'm doing sentiment analysis, comparing review-sentiments of two companies, and I am using the "introduction to tidytext" by Silge et al (2021) and the…

r ggplot2 tidyverse tidytext

asked May 26 '21 at 08:58

Anders Jørgensen

Prev 1 2 3

…

19 20 Next