Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
1
vote
1 answer

How to tokenise on hyphens using unnest_tokens in R

I'm trying to tokenise a dataframe containing strings. Some contain hyphens, and I'd like to tokenise on hyphens using unnest_tokens() I've tried upgrading tidytext from 0.1.9 to 0.2.0 I've tried a number of variations on regex to capture the hyphen…
alexmathios
  • 109
  • 1
  • 7
1
vote
1 answer

Restore original document id from lda object

I'm trying to compare the "consensus" topic prediction (beta) from terms (in a given document) against the most likely predicted topic from the document itself (gamma) using functions from topicmodels. While it's easy to extract the most likely…
Chris T.
  • 1,699
  • 7
  • 23
  • 45
1
vote
1 answer

Issue with tidytext() : unable to apply unnest_tokens to dataframe

I've been trying to apply unnest_tokens from tidytext in a dataframe column to generate common bigrams and trigrams. Theyre short texts from > 200 articles. They're also a column subset from a larger csv. I've tried the following , to no avail: 1.…
flustercludge
  • 77
  • 1
  • 7
1
vote
1 answer

Combining .txt files with character data into a data frame for tidytext analysis

I have bunch of .txt files of Job Descriptions and I want to import them to do text mining analyses. Please find attached some sample text files: https://sample-videos.com/download-sample-text-file.php. Please use the 10kb and 20kb versions because…
1
vote
1 answer

How to clean up CSV data after uploading to Shiny App

Please help! I'm trying to build a Shiny App with the intent to classify data loaded from a CSV file. How do I successfully create a DataFrame from a CSV file (that is uploaded) so that I can move forward and clean/analyze it. Please see code:…
1
vote
0 answers

tm to tidytext conversion

I am trying to learn tidytext. I can follow the examples on tidytext website so long as I use the packages (janeaustenr, eg). However, most of my data are text files in a corpus. I can reproduce the tm to tidytext conversion example for sentiment…
dcoffey
  • 11
  • 3
1
vote
1 answer

Details behind "augment" when applied to topic modeling

I have a question on "augment" function from Silge and Robinson's "Text Mining with R: A Tidy Approach" textbook. Having run an LDA on a corpus, I am applying the "augment" to assign topics to each word. I get the results, but am not sure what takes…
Dave
  • 329
  • 2
  • 10
1
vote
1 answer

Reading file with one column with rows as variable names

I'm trying to work with some sentiment analysis but unfortunately stuck on the very beginning, I can't even import the file. The data is located here: http://snap.stanford.edu/data/web-FineFoods.html It is a 353MB .txt file and and looks like…
tastycanofmalk
  • 628
  • 7
  • 23
1
vote
1 answer

How to represent each word occurrence as a separate tcm vector in R?

I am looking for an efficient way to create a term co-occurrence matrix for (each) target word in a corpus, such that each occurrence of the word would constitute its own vector (row) in a tcm, where the columns are the context words (i.e., a…
user3554004
  • 1,044
  • 9
  • 24
1
vote
0 answers

Sorting in ggplot with facet wrap

I used tidytext and ggplot to compute and plot bigram frequencies (and tf-idfs). I've plotted the most frequent bigrams across four time periods. However, I can't figure out how to correctly sort my counts in all four plots. This is the code I…
Andrea
  • 25
  • 1
  • 5
1
vote
1 answer

Read text and their corresponding page numbers from the .docx in R

How can I read a Microsoft .docx file in R and get the text as one field and page number as another? From the readtext R libraries, I can read the text, but wondering if you know how to get the page number as well?…
Geet
  • 2,515
  • 2
  • 19
  • 42
1
vote
2 answers

failed to get data in single row separated by comma that is grouped by another column values

I have a dataframe with many vars, out of which, two variables are shown in the sample dataset test in the following code: test <- data.frame(row_numb = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3), …
LeMarque
  • 733
  • 5
  • 21
1
vote
1 answer

R - Finding top words in each NRC sentiment and emotion using syuzhet package

Snapshot of the dataset: I'm getting following chart: Here is the code: library(tidytext) library(syuzhet) lyrics$lyric <- as.character(lyrics$lyric) tidy_lyrics <- lyrics %>% unnest_tokens(word,lyric) song_wrd_count <- tidy_lyrics %>%…
user709413
  • 505
  • 2
  • 7
  • 21
1
vote
1 answer

Error in Removing regex, Split Text into Paragraph, and then apply ifelse in R

I am struggling to remove regexm split text into paragraph and then apply IFELSE to a dataframe. I look forward to your help. Thank you. I wish to search for words in the first paragraph for each Text in the dataframe. Thereafter, I have search…
Beginner
  • 262
  • 1
  • 4
  • 12
1
vote
1 answer

R - Count with tidytext data

I'm working on text mining with some Freud books from the Gutenberg project. When I try to do a sentiment analysis, using following code: library(dplyr) library(tidytext) library(gutenbergr) freud_books <- gutenberg_download(c(14969, 15489, 34300,…