Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
0
votes
1 answer

Calculate `tf-idf` for a data frame of documents

The following code library(dplyr) library(janeaustenr) library(tidytext) book_words <- austen_books() %>% unnest_tokens(word, text) %>% count(book, word, sort = TRUE) book_words <- book_words %>% bind_tf_idf(word, book, n) book_words taken…
Mark
  • 1,577
  • 16
  • 43
0
votes
1 answer

tidyverse: Combining a column with different length into exisitng tibble

I have tibble which looks like: Review_Text Because it is a nice game Best trump soumd board out there Boring hated because it does not work when I get done but you can make better game if game has unlimeted chemicals bottles …
user2293224
  • 2,128
  • 5
  • 28
  • 52
0
votes
1 answer

add text to atomic (character) vector in r

Good afternoon, I am not an expert in the topic of atomic vectors but I would like some ideas about it I have the script for the movie "Coco" and I want to be able to get a row that is numbered in the form 1., 2., ... (130 scenes throughout the…
0
votes
2 answers

ngrams analysis in tidytext in R

I am trying to do ngram analysis for in tidytext, I have a corpus of 770 speeches. However the function unnest_tokens in tidytext takes data frame as input. when i checked with the example (jane austin books) each line of the book is stored as row…
jalaj pathak
  • 67
  • 1
  • 8
0
votes
2 answers

select text from multiple combinations of text within a dataframe R

I want to subset data based on a text code that is used in numerous combinations throughout one column of a df. I checked first all the variations by creating a table. list <- as.data.frame(table(EQP$col1)) I want to search within the dataframe…
sar
  • 182
  • 6
  • 26
0
votes
1 answer

R: cleaning pdf text

I have pdf text that I need converted into "tidy" format. But I'm unsure about how to read in the pdf text without compromising the information I need. For example: # install pacman package if you require it if (!require("pacman"))…
dano_
  • 303
  • 1
  • 8
0
votes
2 answers

Mining financial articles R

I'm working on mining some financial articles using tidytext, I download the data from Reuters but then when I'm trying to turn each corpus into a data frame I get some errors about unnest command not taking functions as input... Do you have any…
lgds
  • 43
  • 1
  • 3
0
votes
1 answer

Getting error when trying to install R Package "Tidytext"

Error: package or namespace load failed for ‘tidytext’ in library.dynam(lib, package, package.lib): shared object ‘stringi.so’ not found 6. stop(msg, call. = FALSE, domain = NA) 5. value[3L] 4. tryCatchOne(expr,…
Brad P
  • 31
  • 1
  • 3
0
votes
1 answer

unnest_tokens() in R creates word column but can't select the word column in dplyr commands

enter image description here When I use the unnest_tokens() command it creates a column called word, as you can see I add colnames() after the pipe function after unnesting and it returns word as a column. When I save it as a dataframe the column…
0
votes
0 answers

Text Mining : Pairwise Correlation between words by Group

A simple question. My Data looks like this division_name word Finance Good Commercial Awesome Finance Lovely Commercial Support I am finding pairwise_cor for all possibilites of words above but I get result…
Rana Usman
  • 1,031
  • 7
  • 21
0
votes
1 answer

Using function to calculate a score, then put into a dataframe or tibble with right variable

I am working on a function that will hopefully perform a sentiment analysis for each emotion in the NRC dictionary on a list (see: https://www.tidytextmining.com/sentiment.html#sentiment-analysis-with-inner-join), and then save the score itself as a…
0
votes
2 answers

read delimited .txt file with multiple, interspersed headers in R

I am trying to open and clean a massive oceanographic dataset in R, where station information is interspersed as headers in between the chunks of observations: $ 2008 1 774 8 17 5 11 2 78.4952 6.0375 30 7 1.2 -999.0 -9 -9 -9 -9…
Larusson
  • 267
  • 3
  • 21
0
votes
1 answer

Gather function in R dropping column

I'm comparing the language used by some authors with data downloaded from the Project Gutenberg site but I'm having some trouble with my tibble manipulation. My end goal is to make a plot comparing frequency of word usage by Herman Melville and…
carousallie
  • 776
  • 1
  • 7
  • 25
0
votes
1 answer

Converting Twitter data into a tidy format

I am trying to convert tweets into a tidy text format with the following format and code: ## Convert twitter into a tidy text format where the unit of analysis is the ##`tweet_id-handle-time_stamp-word` tidy_format = trump_clinton_tweets %>%…
maldini425
  • 307
  • 3
  • 14
0
votes
2 answers

better and easy way to find who spoke top 10 anger words from conversation text

I have a dataframe that contains variable 'AgentID', 'Type', 'Date', and 'Text' and a subset is as follows: structure(list(AgentID = c("AA0101", "AA0101", "AA0101", "AA0101", "AA0101"), Type = c("PS", "PS", "PS", "PS",…
loveR
  • 489
  • 4
  • 12