Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
0
votes
2 answers

Tidytext - set expressions as a single token

I am trying to separate my text data into tokens using the unnest_tokens function from the tidytext package. The thing is that some expressions appear multiple times and I would like to keep them a single token instead of multiple tokens. Normal…
Daniel
  • 639
  • 8
  • 24
0
votes
1 answer

Tidytext error '~/Library/Caches/textdata/nrc/NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt' does not exist

I tried to use tidytext to do sentiment analysis library(tidytext) get_sentiments("nrc") but it gives me an error: Error: '~/Library/Caches/textdata/nrc/NRC-Emotion-Lexicon/NRC-Emotion-Lexicon-v0.92/NRC-Emotion-Lexicon-Wordlevel-v0.92.txt' does…
Kexin Ni
  • 3
  • 3
0
votes
1 answer

grouped filter process really slow

So I have this massive tibble with tokens that I'm trying to do some filtering on and then transform into a document term matrix. My problem is that the grouped filtering process runs really slow. Does anyone have a good suggestion on how I can…
MariusJ
  • 71
  • 6
0
votes
1 answer

tidytext problem using dplyr: not count words

I am getting problem with an old script using tidytext and dplyr libraries. My example was extracted from : https://community.rstudio.com/t/problem-with-unnest-tokens-function/94107 But I am having the same problem: library(gutenbergr) …
Rodrigo_BC
  • 161
  • 11
0
votes
2 answers

Tidying characters in R to the least specific detail based on similarity

I have a dataframe of drug IDs (NDC_NBR) and their corresponding drug names (BRAND_NM). I need to collapse/aggregate the drug names to the least specificity as possible per drug. Here is an example of the data I am working with and the expected…
TheGoat
  • 2,587
  • 3
  • 25
  • 58
0
votes
3 answers

How to search for words with asterisks and wildcards (e.g., exampl*) in R (word appearance in a data frame)

I wrote a code to count the appearance of words in a data frame: Items <- c('decid*','head', 'heads') df1<-data.frame(Items) words<- c('head', 'heads', 'decided', 'decides', 'top', 'undecided') df_main<-data.frame(words) item <- vector() count <-…
Asghar
  • 1
  • 2
0
votes
1 answer

How do I keep certain special characters when making ngrams using tidytext::unnest_tokens()?

I'm working on text that has character combinations like "3/8" and "5/8" when referring to particular sizes of things and I'm making bigrams to help analyze the text. I'd like to not have the "/" character removed but am not finding a way to do…
Nickerbocker
  • 117
  • 8
0
votes
2 answers

What would be the best approach for grouping data according to a table of keywords in R

I have the following dictionary for grouping data 1. [aa11, aa21, aa31, aa34], "group A" 2. [x23z, x22z, x32z, x35z, x34z],"group B" 3. [lg32z, lg22z, lg84x, lg94y], "group C" 4. ... The column in the data itself may also have more than…
Jacek Kotowski
  • 620
  • 16
  • 49
0
votes
2 answers

How to calculate avg response time & total response time based on group_by cols and timestamps using R?

I have a table that looks like so (1 example - total of 2 million rows): tweet_id | id | group | created_at | tweet | response_tweet_id 1 sprintcare …
Dinho
  • 704
  • 4
  • 15
0
votes
1 answer

How to create a facet_wrap plot that shows top 10 common words found based on group in R?

Reference code and image below: I have a dataframe that is grouped by company name that looks like so: Company | tweet AMZN @115827 Thanks for your patience. AMZN @115826 I'm sorry for the wait. You'll receive an email as soon as…
Dinho
  • 704
  • 4
  • 15
0
votes
0 answers

Recoding sentence tokens using tidy text mining in R

I'm trying to analyse qualitative responses to a survey using tidy text mining in R. I have tokenised my data via sentences. In some cases, I have found that in one sentence, participants have reported multiple behaviours that I want to analyse…
0
votes
0 answers

unnest_tokens and ERROR in UseMethod("pull")

I'm having problems using unnest_tokens on a particular dataset. I've tried using non-standard evaluation, i.e. unnest_tokens_, to pull the column from the dataframe. However this feature is now deprecated. I've subset the singular column I want to…
Magnetar
  • 85
  • 8
0
votes
0 answers

R tidytext graph comes up vertical and not horizontal

I'm trying to create a sentiment analysis using the tidytext code here but my graph comes out vertical, without the output making sense compared to the original which is horizontal. How can I fix this? #Unnest tokens edAItext = edAI %>%…
Delly
  • 123
  • 1
  • 10
0
votes
1 answer

How do I plot horizontal "histogram"-bars starting at zero

I'm performing topic-modelling applying "Text Mining with R: A tidy approach" by Silge and Robinson. It is not shown how to plot figure 3.6, showing the "greatest difference in β between topic 2 and topic 1". I searched the internet including ways…
Anders Jørgensen
  • 195
  • 1
  • 1
  • 9
0
votes
0 answers

How do I apply ggplot during sentiment analysis, as in Silge et al.'s Jane Austin example

I suspect this to be a fairly straightforward question for coders more experienced than myself. I'm doing sentiment analysis, comparing review-sentiments of two companies, and I am using the "introduction to tidytext" by Silge et al (2021) and the…
Anders Jørgensen
  • 195
  • 1
  • 1
  • 9