Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
0
votes
1 answer

How to tokenize my dataset in R using the tidytext library?

I have been trying to follow Text Mining with R by Julia Silge, however, I cannot tokenize my dataset with the unnest_tokens function. Here are the packages I have loaded: #…
Nate
  • 43
  • 3
0
votes
3 answers

Apply a user-defined function to one df, using a single column in another df

df1 (1,500 rows) shows questions, percent correctly answered, and count of question attempts: qtitle avg_correct attempts "Asthma and exercise, question 1" 54.32 …
CLS
  • 183
  • 1
  • 5
0
votes
1 answer

`str_detect()` and `map()` to iterate through many string detections

My data is in the format below. (Code for data input at the very end, below question). #> df #> id amount description #> 1 10 electricity #> 2 100 rent #> 3 4 fees I would like to be able to classify the…
Jeremy K.
  • 1,710
  • 14
  • 35
0
votes
1 answer

Function turning Factiva-HTML into a tidy-dataframe

Using the tm.plugin.factiva-package I want to create a function that can read Factiva-html files, and return them as a dataframe. So far I've managed to create a function that can read these files, and transform them into a list of dataframes, each…
Eric Nilsen
  • 91
  • 1
  • 9
0
votes
2 answers

Removing Stop words from a list of strings in R

Sample data Dput code of my data x <- structure(list(Comments = structure(2:1, .Label = c("I have a lot of home-work to be completed..", "I want to vist my teacher today only!!"), class = "factor"), Comment_ID = c(704, 802)), class…
Suhas U
  • 43
  • 7
0
votes
0 answers

Problem with tokenization of text, it does not work

I have the following text in a file called prueba.txt: "Gobernando líderes: una reflexión sobre el fútbol como sistema de juego complejo" "La innovación y el aprendizaje en las organizaciones son actividades clave en la empresa actual. A menos que…
0
votes
1 answer

Regular Expression Behavior in R unnest_token() v.s Python pandas str.split()

I want to replicate the result similar to df_long below using python pandas. This is the R code: df <- data.frame("id" = 1, "author" = 'trump', "Tweet" = "RT @kin2souls: @KimStrassel Anyone that votes") unnest_regex <-…
Gam780
  • 17
  • 1
  • 4
0
votes
1 answer

How do I create a new variable in R if it does not already exist?

I am currently using tidytext in R to do some sentiment analysis. I'm using code extremely similar to the one listed at the vignette. This is the example…
0
votes
1 answer

keeping document number in tidytext

When I unnest_tokens for a list I enter manually; the output includes the row number each word came from. library(dplyr) library(tidytext) library(tidyr) library(NLP) library(tm) library(SnowballC) library(widyr) library(textstem) #test…
Susan Ray
  • 37
  • 3
0
votes
1 answer

Parsing text for analysis in R

I have a .txt file that includes short articles, and I want to use R to create a data set that parses each article and extracts the date, author, journal, title, line number, and text for each line of text in each article in a data frame. For…
user3385922
  • 179
  • 1
  • 1
  • 9
0
votes
2 answers

What R package is suited to identifying words that are positively correlated with a binary response variable

I have a tibble that has to three columns: wine - Name of the wine wine_description - Words describing wine (punctuation has been stripped out) target - 0 or 1 variable 1 = Top Rated Wine, 0 = Not Top Rated Wine What R package might I use if I…
Mutuelinvestor
  • 3,384
  • 10
  • 44
  • 75
0
votes
1 answer

Detecting parts of text in foreign languages (Rstudio)

My dataset contains a lot of texts. Texts that were written entirely in foreign languages are dropped. Now, all the texts are written in English, but some have translations in them, e.g. someone that is bilingual that, besides the English text, has…
0
votes
1 answer

R Tidytext unnest_tokens error when using a txt file as source

Very new to this topic. I am having trouble with the unnest_tokens function in the tidytext package. I have some texts stored in .txt format that I want to analyze. An example would be putting the following sentences in a txt file then read it into…
BunZ
  • 25
  • 3
0
votes
1 answer

How to use unnest_token on twitter text data?

I'm trying to run the following and gives me an error message. data <- c("Who said we cant have a lil dance party while were stuck in Quarantine? Happy Friday Cousins!! We got through another week of Quarantine. Lets continue to stay safe, healthy…
0
votes
1 answer

Unable to install "tidytext" and "jasonlite" packages in rstudio

I have tried the following: install.packages("tidytext") library(remotes) install_github("juliasilge/tidytext")``` install.packages(c("mnormt", "psych", "SnowballC", "hunspell", "broom", "tokenizers",…
Erfan
  • 1