Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

votes

1 answer

Calculate `tf-idf` for a data frame of documents

The following code library(dplyr) library(janeaustenr) library(tidytext) book_words <- austen_books() %>% unnest_tokens(word, text) %>% count(book, word, sort = TRUE) book_words <- book_words %>% bind_tf_idf(word, book, n) book_words taken…

r text tidytext

asked Mar 25 '20 at 18:28

Mark

1,577
16
43

votes

1 answer

tidyverse: Combining a column with different length into exisitng tibble

I have tibble which looks like: Review_Text Because it is a nice game Best trump soumd board out there Boring hated because it does not work when I get done but you can make better game if game has unlimeted chemicals bottles …

r tidyverse tidyr tidytext

asked Mar 16 '20 at 06:56

user2293224

2,128
5
28
52

votes

1 answer

add text to atomic (character) vector in r

Good afternoon, I am not an expert in the topic of atomic vectors but I would like some ideas about it I have the script for the movie "Coco" and I want to be able to get a row that is numbered in the form 1., 2., ... (130 scenes throughout the…

r text character readr tidytext

asked Mar 04 '20 at 16:55

Carlos Garibotto

votes

2 answers

ngrams analysis in tidytext in R

I am trying to do ngram analysis for in tidytext, I have a corpus of 770 speeches. However the function unnest_tokens in tidytext takes data frame as input. when i checked with the example (jane austin books) each line of the book is stored as row…

r tidytext

asked Feb 14 '20 at 05:26

jalaj pathak

votes

2 answers

select text from multiple combinations of text within a dataframe R

I want to subset data based on a text code that is used in numerous combinations throughout one column of a df. I checked first all the variations by creating a table. list <- as.data.frame(table(EQP$col1)) I want to search within the dataframe…

r text subset tidytext

asked Feb 04 '20 at 20:04

sar

votes

1 answer

R: cleaning pdf text

I have pdf text that I need converted into "tidy" format. But I'm unsure about how to read in the pdf text without compromising the information I need. For example: # install pacman package if you require it if (!require("pacman"))…

r stringr tidytext pdftools

asked Jan 28 '20 at 16:51

dano_

votes

2 answers

Mining financial articles R

I'm working on mining some financial articles using tidytext, I download the data from Reuters but then when I'm trying to turn each corpus into a data frame I get some errors about unnest command not taking functions as input... Do you have any…

r tidytext

asked Jan 21 '20 at 15:24

lgds

votes

1 answer

Getting error when trying to install R Package "Tidytext"

Error: package or namespace load failed for ‘tidytext’ in library.dynam(lib, package, package.lib): shared object ‘stringi.so’ not found 6. stop(msg, call. = FALSE, domain = NA) 5. value[3L] 4. tryCatchOne(expr,…

tidytext

asked Jan 10 '20 at 04:06

Brad P

votes

1 answer

unnest_tokens() in R creates word column but can't select the word column in dplyr commands

enter image description here When I use the unnest_tokens() command it creates a column called word, as you can see I add colnames() after the pipe function after unnesting and it returns word as a column. When I save it as a dataframe the column…

r dplyr nlp tidytext

asked Jan 07 '20 at 15:09

Monica Puerto

votes

0 answers

Text Mining : Pairwise Correlation between words by Group

A simple question. My Data looks like this division_name word Finance Good Commercial Awesome Finance Lovely Commercial Support I am finding pairwise_cor for all possibilites of words above but I get result…

r dplyr tidyverse text-mining tidytext

asked Dec 16 '19 at 11:01

Rana Usman

1,031
7
21

votes

1 answer

Using function to calculate a score, then put into a dataframe or tibble with right variable

I am working on a function that will hopefully perform a sentiment analysis for each emotion in the NRC dictionary on a list (see: https://www.tidytextmining.com/sentiment.html#sentiment-analysis-with-inner-join), and then save the score itself as a…

r lapply tibble tidytext

asked Dec 10 '19 at 17:51

Jonathan D.

votes

2 answers

read delimited .txt file with multiple, interspersed headers in R

I am trying to open and clean a massive oceanographic dataset in R, where station information is interspersed as headers in between the chunks of observations: $ 2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999.0 -9 -9 -9 -9…

r file-io tidyr tidytext

asked Nov 14 '19 at 19:17

Larusson

votes

1 answer

Gather function in R dropping column

I'm comparing the language used by some authors with data downloaded from the Project Gutenberg site but I'm having some trouble with my tibble manipulation. My end goal is to make a plot comparing frequency of word usage by Herman Melville and…

r ggplot2 tidytext

asked Oct 31 '19 at 15:57

carousallie

votes

1 answer

Converting Twitter data into a tidy format

I am trying to convert tweets into a tidy text format with the following format and code: ## Convert twitter into a tidy text format where the unit of analysis is the ##`tweet_id-handle-time_stamp-word` tidy_format = trump_clinton_tweets %>%…

r nlp tidyverse tidytext

asked Oct 22 '19 at 02:19

maldini425

votes

2 answers

better and easy way to find who spoke top 10 anger words from conversation text

I have a dataframe that contains variable 'AgentID', 'Type', 'Date', and 'Text' and a subset is as follows: structure(list(AgentID = c("AA0101", "AA0101", "AA0101", "AA0101", "AA0101"), Type = c("PS", "PS", "PS", "PS",…

r sentiment-analysis grepl tidytext sentimentr

asked Aug 27 '19 at 10:14

loveR

Prev 1 2 3

…

19 20 Next