Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
1
vote
2 answers

Plotting Bigrams in Bar Chart with ggplot2

My data looks like this: > str(bigrams_joined) Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 71319 obs. of 2 variables: $ line : int 1 1 1 1 1 1 1 1 1 1 ... $ bigrams: chr "in practice" "practice risk" "risk management" "management is" I would…
Davide Lorino
  • 875
  • 1
  • 9
  • 27
1
vote
0 answers

R package installation while running through SSIS halts giving the following error:

install.packages("dplyr") library(dplyr) install.packages("tidytext") library(tidytext) install.packages("reshape") library(reshape) install.packages("ggplot2") library(ggplot2) install.packages("tidyr") library(tidyr) install.packages("twitteR") lib…
1
vote
1 answer

tidytext Error: Can't convert a function to a quosure

I am starting to use tidytext to get basic word frequencies for a text file with a collection of emails and lots of garbage in between. The relevant part of the script is: library(tidytext) data <- read_lines("emails.txt") text_tibble <-…
user9365328
  • 29
  • 1
  • 3
1
vote
1 answer

Opposite of unnest_tokens in R

I have a data frame that I have converted to tidy text format in R to get rid of stop words. I would now like to 'untidy' that data frame back to its original format. What's the opposite / inverse command of unnest_tokens? I checked answer in…
1
vote
1 answer

R unnest with Sentence start and end positions

New to R. I am using tidytext::unnest_tokens to break down a long text into individual sentences using below tidy_drugs <- drugstext.raw %>% unnest_tokens(sentence, Section, token="sentences") So I get a data.frame with all the sentences…
Krishna
  • 61
  • 1
  • 5
1
vote
2 answers

Sentiment Analysis in R with tidyverse package - object 'sentiment' not found

I am trying to reproduce this exmple of sentiment analysis: https://www.kaggle.com/rtatman/tutorial-sentiment-analysis-in-r I have a "file.txt" with the text I want to analyze in "../input"…
Michael
  • 159
  • 1
  • 2
  • 14
1
vote
1 answer

R text mining n grams(bigrams) no result returned. Anyone has same experience?

I'm using tidytext packages for n grams text mining. I tried on 2 columns of texts, n grams (bigrams) function is working well for one but 0 obs returned for the other one. 2 columns from same resource so no diff with format but just diff content.…
MJW
  • 29
  • 7
1
vote
1 answer

Regex //divxlc in text analysis in R book code

I am currently studying the Text Analysis in R book by Silge and Robinson and given my newbie status I can't come around to understanding exactly how this regex "^chapter [\\divxlc]" works out the chapter numbers when tidying the texts. I have…
ogorodriguez
  • 103
  • 3
  • 9
1
vote
2 answers

Error when using tidytext to calculate word frequencies in R

I've been trying to calculate word frequencies with the tidytext package. v <- "Everybody dance now! Give me the music Everybody dance now! Give me the music Everybody dance now! Everybody dance now! Yeah! Yeah! Yeah!" v <- as.character(v) v %>%…
Vickie Ip
  • 183
  • 1
  • 1
  • 5
1
vote
1 answer

Text Mining with Tidytext: problems pairwise_count and pairwise_cor

I'm experimenting with Tidytext (Text Mining with R) and I want to use the functions pairwise_count and pairwise_cor from the widyr library. My corpus is from a per-processed text…
1
vote
1 answer

Converting data frame to tibble with word count

I'm attempting to perform sentiment analysis based on http://tidytextmining.com/sentiment.html#the-sentiments-dataset . Prior to performing sentiment analysis I need to convert my dataset into a tidy format. my dataset is of form : x <- c( "test1"…
blue-sky
  • 51,962
  • 152
  • 427
  • 752
1
vote
2 answers

Filtering text from numbers and stopwords in R(not for tdm)

I have text corpus. mytextdata = read.csv(path to texts.csv) Mystopwords=read.csv(path to mystopwords.txt) How can I filter this text? I must delete: 1) all numbers 2) pass through the stop words 3) remove the brackets I will not work with dtm,…
psysky
  • 3,037
  • 5
  • 28
  • 64
1
vote
2 answers

'sep' is not an exported object from 'namespace:dplyr'

obtaining n-grams following this book on tydy-text: http://tidytextmining.com/ngrams.html The code: library(tidyr) bigrams_separated <- austen_bigrams %>% separate(bigram, c("word1", "word2"), sep = " ") bigrams_filtered <- bigrams_separated…
Forge
  • 1,587
  • 1
  • 15
  • 36
1
vote
2 answers

Issue with syllabification and regex

I have a pdf file that I am reading as a text. The problem I am having has to do with syllabification occurring between numbers. Link to file on github. library(pdftools) library(tidytext) library(readxl) library(dplyr) setwd("~/Automation -…
Prometheus
  • 1,977
  • 3
  • 30
  • 57
1
vote
2 answers

Can I combine pairwise_cor and pairwise_count to get the phi coefficient AND number of occurrences for each pair of words?

I'm new to R, and I'm using widyr to do text mining. I successfully used the methods found here to get a list of co-occurring words within each section of text and their phi coefficient. Code as follows: word_cors <- review_words %>% …
ElizabethW
  • 13
  • 5