Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

votes

1 answer

How to tokenize my dataset in R using the tidytext library?

I have been trying to follow Text Mining with R by Julia Silge, however, I cannot tokenize my dataset with the unnest_tokens function. Here are the packages I have loaded: #…

r text text-mining tidytext

asked Jul 14 '20 at 11:34

Nate

votes

3 answers

Apply a user-defined function to one df, using a single column in another df

df1 (1,500 rows) shows questions, percent correctly answered, and count of question attempts: qtitle avg_correct attempts "Asthma and exercise, question 1" 54.32 …

r dictionary apply tidyverse tidytext

asked Jul 12 '20 at 21:00

CLS

votes

1 answer

`str_detect()` and `map()` to iterate through many string detections

My data is in the format below. (Code for data input at the very end, below question). #> df #> id amount description #> 1 10 electricity #> 2 100 rent #> 3 4 fees I would like to be able to classify the…

r dplyr tidyverse purrr tidytext

asked Jul 12 '20 at 03:32

Jeremy K.

1,710
14
35

votes

1 answer

Function turning Factiva-HTML into a tidy-dataframe

Using the tm.plugin.factiva-package I want to create a function that can read Factiva-html files, and return them as a dataframe. So far I've managed to create a function that can read these files, and transform them into a list of dataframes, each…

r merge tm corpus tidytext

asked Jul 10 '20 at 15:07

Eric Nilsen

votes

2 answers

Removing Stop words from a list of strings in R

Sample data Dput code of my data x <- structure(list(Comments = structure(2:1, .Label = c("I have a lot of home-work to be completed..", "I want to vist my teacher today only!!"), class = "factor"), Comment_ID = c(704, 802)), class…

r dplyr text-mining tidytext

asked Jun 24 '20 at 13:01

Suhas U

votes

0 answers

Problem with tokenization of text, it does not work

I have the following text in a file called prueba.txt: "Gobernando líderes: una reflexión sobre el fútbol como sistema de juego complejo" "La innovación y el aprendizaje en las organizaciones son actividades clave en la empresa actual. A menos que…

r tidytext

asked Jun 24 '20 at 06:09

user20336

votes

1 answer

Regular Expression Behavior in R unnest_token() v.s Python pandas str.split()

I want to replicate the result similar to df_long below using python pandas. This is the R code: df <- data.frame("id" = 1, "author" = 'trump', "Tweet" = "RT @kin2souls: @KimStrassel Anyone that votes") unnest_regex <-…

python r regex pandas tidytext

asked Jun 22 '20 at 10:00

Gam780

votes

1 answer

How do I create a new variable in R if it does not already exist?

I am currently using tidytext in R to do some sentiment analysis. I'm using code extremely similar to the one listed at the vignette. This is the example…

r dplyr spread tidytext

asked Jun 16 '20 at 19:17

Jonathan D.

votes

1 answer

keeping document number in tidytext

When I unnest_tokens for a list I enter manually; the output includes the row number each word came from. library(dplyr) library(tidytext) library(tidyr) library(NLP) library(tm) library(SnowballC) library(widyr) library(textstem) #test…

r row-number tidytext unnest

asked May 19 '20 at 13:58

Susan Ray

votes

1 answer

Parsing text for analysis in R

I have a .txt file that includes short articles, and I want to use R to create a data set that parses each article and extracts the date, author, journal, title, line number, and text for each line of text in each article in a data frame. For…

r parsing text dplyr tidytext

asked May 13 '20 at 02:09

user3385922

votes

2 answers

What R package is suited to identifying words that are positively correlated with a binary response variable

I have a tibble that has to three columns: wine - Name of the wine wine_description - Words describing wine (punctuation has been stripped out) target - 0 or 1 variable 1 = Top Rated Wine, 0 = Not Top Rated Wine What R package might I use if I…

r dplyr text-mining tidytext qdap

asked May 10 '20 at 14:27

Mutuelinvestor

3,384
10
44
75

votes

1 answer

Detecting parts of text in foreign languages (Rstudio)

My dataset contains a lot of texts. Texts that were written entirely in foreign languages are dropped. Now, all the texts are written in English, but some have translations in them, e.g. someone that is bilingual that, besides the English text, has…

text filtering tidytext

asked Apr 30 '20 at 13:43

T. Trogman

votes

1 answer

R Tidytext unnest_tokens error when using a txt file as source

Very new to this topic. I am having trouble with the unnest_tokens function in the tidytext package. I have some texts stored in .txt format that I want to analyze. An example would be putting the following sentences in a txt file then read it into…

r text-mining tidytext

asked Apr 14 '20 at 06:30

BunZ

votes

1 answer

How to use unnest_token on twitter text data?

I'm trying to run the following and gives me an error message. data <- c("Who said we cant have a lil dance party while were stuck in Quarantine? Happy Friday Cousins!! We got through another week of Quarantine. Lets continue to stay safe, healthy…

r twitter tidyverse unnest tidytext

asked Apr 11 '20 at 17:13

Chamil Rathnayake

votes

1 answer

Unable to install "tidytext" and "jasonlite" packages in rstudio

I have tried the following: install.packages("tidytext") library(remotes) install_github("juliasilge/tidytext")``` install.packages(c("mnormt", "psych", "SnowballC", "hunspell", "broom", "tokenizers",…

r tidytext

asked Apr 03 '20 at 15:28

Erfan

Prev 1 2 3

…

19 20 Next