Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

votes

1 answer

How to correctly remove stop words using tidytext package in R?

I am using stopwords dataset in tidytext package in R to remove stopwords. I am using following code: library(tidyverse) library(tidytext) library(dplyr) data(stop_words) example_words <- c("the", "quick", "brown", "fox", "jumps", "over", "the",…

r nlp stop-words tidytext

asked Apr 06 '23 at 22:20

student_R123

votes

1 answer

Extract different hashtags "#" from a text stored in a Dataframe with the R language

I have a data frame with some tweets and i want to extract the hashtags from the tweets using the unnest_tokens() function of tidytext package , creating a tokenized data frame with one row per hashtag. My data only have 3 columns: Fecha: that is a…

r twitter tweets tidytext

asked Mar 31 '23 at 01:11

Juan Jose Echeverry De Mendoza

votes

0 answers

I'm trying to count the number of words in a text but the count function is throwing an error message. I will be grateful for any help. Thanks

library(tidytext) library (dplyr) anfarm %>% unnest_tokens(output = "word", input = "text_column", token = "words") %>% count(word, sort = TRUE) #> Error in UseMethod("count") : #> no applicable method…

r counting tidytext

asked Feb 01 '23 at 10:31

Nana

votes

3 answers

Removing specific text R

I have a character vector in a data frame in R which contains inbound email text. Most of the rows contain 'Dear x,' where x is any intended recipient and x can vary. There could also be typos such as the incorrect use of lowercase. Either way, the…

r stringr tidytext

asked Dec 26 '22 at 13:05

I_like_insights

votes

1 answer

ggplot sort descending points within group

I want to arrange the plot below so that 'group' is arranged in descending order by 'Distance' within Community (Out, In). I've tried using dplyr::arrange and tidytext::reorder_within(group, -value, MPA_type), but neither of these work - ggplot…

r ggplot2 facet tidytext

asked Nov 30 '22 at 18:11

Joshua Smith

votes

2 answers

Passing a vector of characters into another string in R

I would like to know how to pass a vector of text into a string within R. I have a list of emails stored as a character vector: all.emails…

r string tidyverse character tidytext

asked Oct 20 '22 at 17:11

I_like_insights

votes

0 answers

Rstudio tokenizing multiple documents messy

I am trying to tokenize different documents in Rstudio, but because the documents are really big it gets messy when tokenizing it with 1 word in a row. Is there a solution to keep the tokenized words in 1 row? I first made a corpus and then…

r text-mining corpus tidytext

asked Oct 05 '22 at 10:30

Babette Besselink

votes

2 answers

Is there a way in R to find a combination of words (or sentences) within a certain range in a string

I'm trying to find all strings with a combination of words/sentences with other words separating them but with a fixed limit. Example : I want the combination of "bought" and "watch" but with, at maximum, 2 words separating them. I bought a…

r text-mining stringr tidytext

asked Jun 01 '22 at 14:38

Ugo Labbé

votes

2 answers

Extract a 100-Character Window around Keywords in Text Data with R (Quanteda or Tidytext Packages)

This is my first time asking a question on here so I hope I don't miss any crucial parts. I want to perform sentiment analysis on windows of speeches around certain keywords. My dataset is a large csv file containing a number of speeches, but I'm…

r nlp quanteda tidytext

asked Apr 27 '22 at 19:58

kornpat

votes

1 answer

How do I load large (25k and + words) .txt documents to then structure it as one token per row?

How could I load a big folder (more than 100 .txt files) of files for textmining (analysing the most frequent words, their evolution, word clustering and topic, POS, and so) with the TidyText package? I am currently using Silge's & Robinson's "text…

r text-mining tidytext

asked Apr 05 '22 at 07:29

IvanLdF

votes

1 answer

how to unlist a `tknlist`?

step_tokenize returns a vector of type tknlist. How can I get a rectangular for of it? I mean something like unnesting the tokens and add them a cols of the tibble. library(textrecipes) library(modeldata) data(tate_text) tate_rec <- recipe(~., data…

tidyverse tidytext r-recipes

asked Mar 30 '22 at 01:37

Nip

votes

0 answers

Get zero tf_idf from dfm with quanteda r

I want to create a Document-feature matrix with tf_idf as weights. If I calculate the tf_idf like in https://quanteda.io/reference/dfm_tfidf.html, I get only zeros. The same if I try to get tf_idf with tidytext from the same token dataset. Looks to…

r nlp tf-idf quanteda tidytext

asked Mar 29 '22 at 15:04

padul

votes

0 answers

Restore original data from document term matrix in R

I want to know if there is a way to go back to my original database (df) after I have made it a document term matrix. Here is an example of what I want to do. df <- data.frame(group=c("A","A","B","B","C"), comment = c("hello…

r matrix tm tidytext

asked Mar 01 '22 at 20:09

Sergio Parra

votes

1 answer

Errors in counting + combining bing sentiment score variables in Tidytext?

I'm doing sentiment analysis on a large corpus of text. I'm using the bing lexicon in tidytext to get simple binary pos/neg classifications, but want to calculate the ratios of positive to total (positive & negative) words within a document. I'm…

r dplyr sentiment-analysis tidytext

asked Feb 01 '22 at 23:42

PoliSci_Fiend

votes

2 answers

Tidytext R - find and replace

I have the results from a survey, in which a bunch of anwsers have errors, such as misspellings, UppercAseS/lower cases, ... Therefore, I need something like a find and replace kind of solution (I've found some possible functions but none of them…

r tidytext

asked Dec 17 '21 at 17:52

Tiago

Prev 1 2 3

…

19 20 Next