Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

vote

1 answer

How to tokenise on hyphens using unnest_tokens in R

I'm trying to tokenise a dataframe containing strings. Some contain hyphens, and I'd like to tokenise on hyphens using unnest_tokens() I've tried upgrading tidytext from 0.1.9 to 0.2.0 I've tried a number of variations on regex to capture the hyphen…

regex tidytext

asked Jun 13 '19 at 16:51

alexmathios

vote

1 answer

Restore original document id from lda object

I'm trying to compare the "consensus" topic prediction (beta) from terms (in a given document) against the most likely predicted topic from the document itself (gamma) using functions from topicmodels. While it's easy to extract the most likely…

r lda tidytext topicmodels

asked May 16 '19 at 08:02

Chris T.

1,699
7
23
45

vote

1 answer

Issue with tidytext() : unable to apply unnest_tokens to dataframe

I've been trying to apply unnest_tokens from tidytext in a dataframe column to generate common bigrams and trigrams. Theyre short texts from > 200 articles. They're also a column subset from a larger csv. I've tried the following , to no avail: 1.…

r rstudio tidytext

asked Feb 03 '19 at 15:58

flustercludge

vote

1 answer

Combining .txt files with character data into a data frame for tidytext analysis

I have bunch of .txt files of Job Descriptions and I want to import them to do text mining analyses. Please find attached some sample text files: https://sample-videos.com/download-sample-text-file.php. Please use the 10kb and 20kb versions because…

r tokenize tidyverse tm tidytext

asked Dec 05 '18 at 23:16

Reuben Sarwal

vote

1 answer

How to clean up CSV data after uploading to Shiny App

Please help! I'm trying to build a Shiny App with the intent to classify data loaded from a CSV file. How do I successfully create a DataFrame from a CSV file (that is uploaded) so that I can move forward and clean/analyze it. Please see code:…

r shiny tidytext

asked Nov 30 '18 at 01:19

Kristiaan Oord

vote

0 answers

tm to tidytext conversion

I am trying to learn tidytext. I can follow the examples on tidytext website so long as I use the packages (janeaustenr, eg). However, most of my data are text files in a corpus. I can reproduce the tm to tidytext conversion example for sentiment…

tm tidytext

asked Nov 16 '18 at 17:56

dcoffey

vote

1 answer

Details behind "augment" when applied to topic modeling

I have a question on "augment" function from Silge and Robinson's "Text Mining with R: A Tidy Approach" textbook. Having run an LDA on a corpus, I am applying the "augment" to assign topics to each word. I get the results, but am not sure what takes…

r text-mining lda topic-modeling tidytext

asked Nov 16 '18 at 15:27

Dave

vote

1 answer

Reading file with one column with rows as variable names

I'm trying to work with some sentiment analysis but unfortunately stuck on the very beginning, I can't even import the file. The data is located here: http://snap.stanford.edu/data/web-FineFoods.html It is a 353MB .txt file and and looks like…

r read.table tidytext

asked Nov 06 '18 at 02:51

tastycanofmalk

vote

1 answer

How to represent each word occurrence as a separate tcm vector in R?

I am looking for an efficient way to create a term co-occurrence matrix for (each) target word in a corpus, such that each occurrence of the word would constitute its own vector (row) in a tcm, where the columns are the context words (i.e., a…

r sparse-matrix quanteda tidytext text2vec

asked Oct 23 '18 at 17:00

user3554004

1,044
9
24

vote

0 answers

Sorting in ggplot with facet wrap

I used tidytext and ggplot to compute and plot bigram frequencies (and tf-idfs). I've plotted the most frequent bigrams across four time periods. However, I can't figure out how to correctly sort my counts in all four plots. This is the code I…

r ggplot2 facet-wrap tidytext

asked Aug 26 '18 at 19:55

Andrea

vote

1 answer

Read text and their corresponding page numbers from the .docx in R

How can I read a Microsoft .docx file in R and get the text as one field and page number as another? From the readtext R libraries, I can read the text, but wondering if you know how to get the page number as well?…

r tm text-analysis tidytext

asked Jul 30 '18 at 19:09

Geet

2,515
2
19
42

vote

2 answers

failed to get data in single row separated by comma that is grouped by another column values

I have a dataframe with many vars, out of which, two variables are shown in the sample dataset test in the following code: test <- data.frame(row_numb = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3), …

r dplyr tidyr tidytext

asked Jul 21 '18 at 22:00

LeMarque

vote

1 answer

R - Finding top words in each NRC sentiment and emotion using syuzhet package

Snapshot of the dataset: I'm getting following chart: Here is the code: library(tidytext) library(syuzhet) lyrics$lyric <- as.character(lyrics$lyric) tidy_lyrics <- lyrics %>% unnest_tokens(word,lyric) song_wrd_count <- tidy_lyrics %>%…

r text-mining sentiment-analysis tidytext

asked Jul 11 '18 at 12:51

user709413

vote

1 answer

Error in Removing regex, Split Text into Paragraph, and then apply ifelse in R

I am struggling to remove regexm split text into paragraph and then apply IFELSE to a dataframe. I look forward to your help. Thank you. I wish to search for words in the first paragraph for each Text in the dataframe. Thereafter, I have search…

r dplyr tidyr tidyverse tidytext

asked Jun 19 '18 at 08:50

Beginner

vote

1 answer

R - Count with tidytext data

I'm working on text mining with some Freud books from the Gutenberg project. When I try to do a sentiment analysis, using following code: library(dplyr) library(tidytext) library(gutenbergr) freud_books <- gutenberg_download(c(14969, 15489, 34300,…

r count tidytext

asked May 03 '18 at 20:50

Ricardo Silva

Prev 1 2 3

…

19 20 Next