Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

vote

2 answers

Find characters before and after dollar amount in vector of text data in R

I have a vector of text data (news data). I am trying to scan the text for any money amount and the text surrounding this amount. I managed this with the first element of my vector but struggle with using a loop and list to repeat the process for…

r regex string tidytext

asked Jan 21 '22 at 13:39

Marco

2,368
6
22
48

vote

1 answer

unnest_tokens and keep original columns (tidytext)

The unnest_tokens function of the package tidytext is supposed to keep the other columns of the dataframe (tibble) you pass to it. In the example provided by the authors of the package ("tidy_books" on Austen's data) it works fine, but I get some…

r tidytext

asked Nov 22 '21 at 11:52

Dario Lacan

1,099
1
11
25

vote

1 answer

Error in R term frequency analysis (TF-IDF)

I tried to run the following code with the following data: library(dplyr) library(janeaustenr) library(tidytext) book_words <- austen_books() %>% unnest_tokens(word, text) %>% count(book, word, sort = TRUE) For this, I get this error…

r text tf-idf tidytext

asked Nov 14 '21 at 22:27

Renée

vote

1 answer

Correlation and graph layout in widyr and ggraph when tidy text mining

I'm using a tutorial (https://www.tidytextmining.com/nasa.html?q=correlation%20ne#networks-of-keywords) to learn about tidy text mining. I am hoping someone might be able to help with two questions: in this tutorial, the correlation used to make…

nlp tidytext ggraph

asked Sep 28 '21 at 13:47

Gabriella

vote

2 answers

How to extract key phrases following specific characters using regex in R?

I have a dataframe that looks like so: ID | Tweet_ID | Tweet 1 12345 @sprintcare I did. 2 SPRINT @12345 Please send us a Private Message. 3 45678 @apple My information is incorrect. 4 APPLE @45678 What information is…

r regex dplyr tidytext

asked Sep 15 '21 at 03:04

Dinho

vote

3 answers

R: Text Mining, create list of words per document

I am reading in the text from a number of PDFs in a directory. Then, I split these texts into single words (tokens) using the tidytext::unnest_tokens()-function. Can someone please tell me, how I can add an additional column to the test-tibble with…

r tidyverse text-mining tidytext

asked Aug 05 '21 at 23:37

D. Studer

1,711
1
16
35

vote

1 answer

bind_tf_idf() error: in tapply(n, documents, sum) : arguments must have same length

I am trying to do bind_tf_idf() for the following df. My df has two documents/classes: Y or N. > test_2 # A tibble: 3,295 x 2 Class word 1 Y nature 2 Y great 3 Y are 4 Y present 5 N in 6…

r tf-idf tapply tidytext

asked Jul 17 '21 at 05:40

aurelius_37809

vote

1 answer

R tidytext sentiment analysis- how to use the drop parameter

I recently asked a question about entries that are omitted after a sentiment analysis. The tweets that I analyse don't always contain words that are in the lexicon. I would like to know which ones can't be translated. So I would like to keep these…

r sentiment-analysis tidytext

asked Jul 11 '21 at 17:07

Iarwain

vote

1 answer

How to efficiently handle big data in R for text mining

With the help of the tidytext package, I'm trying to count all bigrams and trigrams for a personal example. However, this personal dataset has +1 million lines (paragraphs really) and lots of words in each one. This is a memory-intensive process…

r parallel-processing bigdata text-mining tidytext

asked Apr 05 '21 at 01:41

caproki

vote

1 answer

Add detected topics to input data

library(dplyr) library(ggplot2) library(stm) library(janeaustenr) library(tidytext) library(quanteda) testDfm <- gadarian$open.ended.response %>% tokens(remove_punct = TRUE, remove_numbers = TRUE, remove_symbols = TRUE) %>% dfm() out…

r quanteda tidytext

asked Dec 01 '20 at 15:51

rek

vote

1 answer

usage of bind tf_df in R

library(janeaustenr) library(tidytext) library(tidyverse) library(tm) library(corpus) text <- removeNumbers(sensesensibility) text <- data.frame(text) tidy_text <- text %>%…

r tidytext

asked Jun 24 '20 at 22:00

Vikram

vote

2 answers

Mapping the topic of the review in R

I have two data sets, Review Data & Topic Data Dput code of my Review Data structure(list(Review = structure(2:1, .Label = c("Canteen Food could be improved", "Sports and physical exercise need to be given importance"), class = "factor")), class =…

r dplyr text-mining tm tidytext

asked Jun 22 '20 at 14:50

Suhas U

vote

0 answers

How to reorder facet_grid() columns based in R using ggplot2?

Don't think this is a duplicate of others, but happy to delete if it is. Dataset contains 3 columns: 'Recipient' (x-axis), 'Amount' (y-axis), and 'Department'(grid-column/fill). How can I re-order facet grids more intuitively in descending order by…

r ggplot2 data-visualization tidytext

asked Jun 19 '20 at 03:30

owlstone

vote

1 answer

Tokenizing word using tidytext - preserving punctuation

I've been trying to preserve punctation like "-" "(" "/" "'" when tokenizing word. data = tibble(title = "Computer-aided detection (1 / 2)") data %>% unnest_tokens(input = title, output = słowo, token =…

r tidytext unnest

asked Apr 17 '20 at 09:15

Pawliczek

vote

1 answer

R unnest_tokens elements from list

I have this: library(tidytext) list_chars <- list("you and I", "he or she", "we and they") list_chars_as_tibble <- lapply(list_chars, tibble) list_chars_by_word <- lapply(list_chars_as_tibble, unnest_tokens) got this: Error in check_input(x) : …

r token tidytext unnest

asked Apr 12 '20 at 11:50

nasifffors

Prev 1 2 3

…

19 20 Next