Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

vote

1 answer

r: unnest_tokens() not working with particular file

i am trying to run unnest_tokens() on the essay4 column of this dataset: https://github.com/rudeboybert/JSE_OkCupid/blob/master/profiles.csv.zip i have tried both unnest_tokens() and unnest_tokens_(), as well as running dput(as_tibble()) on…

r nlp text-mining tidytext

asked Mar 27 '20 at 21:48

qwerty

vote

2 answers

Some help getting started with tidytext

I have a project I'm working on in tidytext, which I'm pretty new to. My input data is currently in the form of individual .txt files in a folder. I successfully used get_sentiments() to track the positive/negative sentiments of my data, but I'm…

r text-mining topic-modeling tidytext

asked Mar 25 '20 at 14:13

helplessprogrammer134

vote

1 answer

Error while using unnest_tokens() while passing a function to the token

Error in unnest_tokens.data.frame(., entity, text, token = tokenize_scispacy_entities, : Expected output of tokenizing function to be a list of length 100 The unnest_tokens() works well for a sample of few observations but fails on the entire…

r tokenize spacy tidytext

asked Mar 23 '20 at 16:56

Sagar K

vote

1 answer

No applicable method for 'tidy' applied to an object of class "factor" in Tidytext

I'm starting doing text mining in R and I've some problems. I have a csv with users comments about a page. Each row is a different comment. It only has 1 column, the one that has the comments. I was trying to use Tidy in R so I import the file…

r text-mining tidytext

asked Mar 21 '20 at 21:30

Pablo

vote

1 answer

Count only alphanumeric characters in a string

Given the string "This has 4 words!" I would like to count only the letters and digits. I would like to exclude whitespace and punctuation. As such, the string above should return 13. I'm not sure why, but I cannot get this for R.

r text tidytext

asked Mar 04 '20 at 17:53

Adam_G

7,337
20
86
148

vote

1 answer

Join tokens back to sentence

I am doing some text analysis with some free text data with tidytext. Consider a sample sentences: "The quick brown fox jumps over the lazy dog" "I love books" My token approach using tidytext: unigrams = tweet_text %>% unnest_tokens(output =…

r nlp tidyverse tidytext

asked Feb 23 '20 at 02:53

macworthy

vote

1 answer

Why do I get dependency-error trying to install package "tidytext" in RStudio

I tried to install tidytext package and received below dependency-ERROR. Please help. ERROR: dependency ‘ISOcodes’ is not available for package ‘stopwords’ ERROR: dependency ‘stopwords’ is not available for package ‘tidytext’

install.packages tidytext

asked Feb 15 '20 at 07:18

Curi0us

vote

1 answer

tidytext: Issue with unnest_tokens and token = 'ngrams'

I'm running the following code library(rwhatsapp) library(tidytext) chat <- rwa_read(x = c( "31/1/15 04:10:59 - Menganito: Was it good?", "31/1/15 14:10:59 - Fulanito: Yes, it was" )) chat %>% as_tibble() %>% unnest_tokens(output = bigram,…

r token whatsapp tidytext unnest

asked Jan 31 '20 at 17:57

piblo95

vote

1 answer

Loop over list in R, conduct analysis specific to element in list, save results in element dataframe?

I am trying to replicate an analysis using tidytext in R, except using a loop. The specific example comes from Julia Silge and David Robinson's Text Mining with R, a Tidy Approach. The context for it can be found here:…

r loops for-loop tidytext

asked Nov 04 '19 at 18:32

Jonathan D.

vote

1 answer

How to do tokenizing by n-gram for pdf file in R

I want to tokenize a pdf document by ngrams in R. I tried to follow the instructions here at https://www.tidytextmining.com/ngrams.html, but get stuck with the unnest_tokens()…

r tokenize text-mining tidytext

asked Oct 18 '19 at 21:17

dss333

vote

1 answer

Trying to extract a subset of pages from each pdf in a directory with 70 pdf files

I am using tidyverse, tidytext, and pdftools. I want to parse words in a directory of 70 pdf files. I am using these tools to do this successfully but the code below grabs all the pages instead of the subset I want. I need to skip the first two…

r pdf tidyverse tidytext pdftools

asked Oct 18 '19 at 19:26

Craig Byron

vote

1 answer

Find documents that include one of a list of words in R

I have two dataframes: msnbc contains a column of news transcripts called text and dictionary contains a column of words called search. I want to return a new dataframe that includes all rows of msnbc where the text field contains one or more words…

r text stringr tidytext

asked Sep 26 '19 at 19:27

James Martherus

1,033
1
9
20

vote

1 answer

How to add words manually to nrc sentiment lexicon?

I plan on using the nrc sentiment lexicon with twitter but I realize that there are many words missing. Can anybody guide me on how to add some words with their specific sentiment on R? (I have downloaded the nrc to my environment and also have…

r tidytext

asked Jul 26 '19 at 16:11

Froy Valdez

vote

0 answers

Filter the top 20% of an if_tdf dtm by group

I have a text with different classes. My goal is to determine and keep only the features with the highest tf_idf value (top 20%) of each class. As an example, I use the book_of_mormon data set. text is the text and book_title is the class. An idea…

r text tidytext

asked Jun 18 '19 at 20:27

Banjo

1,191
1
11
28

vote

2 answers

How to Combine Multiple Rows Into One Using TidyText

I am looking at a novel and want to search for the appearance of characters' names throughout the book Some characters go by different names. For example, the character "Sissy Jupe" goes by "Sissy" and "Jupe". I want to combine two rows of word…

r dplyr tidytext

asked Jun 14 '19 at 22:27

Tom Liam Lynch

Prev 1 2 3

…

19 20 Next