Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

votes

2 answers

Graph with ordered bars and using facets

I am trying to make a graph with ordered bars according to frequency and also using a variable two separate two variables using facets. Words have to be ordered by value given in 'n' variable. So, my graph should look like this one which appears in…

r ggplot2 tidytext

asked May 16 '18 at 16:51

Tito Sanz

1,280
1
16
33

votes

1 answer

Passing `top_n` and `arrange` to ggplot (dplyr)

There is a lovely chunk of code in TidyText Mining Section 3.3 that I am trying to replicate in my own dataset. However, in my data, I cannot get ggplot to 'remember' that I want the data in descending order, and that I want a certain top_n. I can…

r ggplot2 tidytext

asked May 16 '18 at 16:25

JMacKay

votes

4 answers

Using tidytext and broom but not finding tidier for LDA_VEM

The tidytext book has examples with a tidier for topicmodels: library(tidyverse) library(tidytext) library(topicmodels) library(broom) year_word_counts <- tibble(year = c("2007", "2008", "2009"), + word = c("dog", "cat",…

r broom tidytext

asked Feb 13 '18 at 11:37

Isaiah

2,091
3
19
28

votes

1 answer

unnest_tokens fails to handle vectors in R with tidytext package

I want to use the tidytext package to create a column with 'ngrams'. with the following code: library(tidytext) unnest_tokens(tbl = president_tweets, output = bigrams, input = text, token = "ngrams", …

r text-analysis tidytext

asked Dec 20 '17 at 16:14

Tdebeus

1,519
5
21
43

votes

1 answer

Using unnest_tokens() to split a column by a specific character?

I'm working with a column of vectors of urls formatted as a string with each url separated by a comma: column_with_urls ["url.a, url.b, url.c"] ["url.d, url.e, url.f"] I would like to use the tidytext::unnest_tokens() R function to separate these…

r tidytext

asked Dec 05 '17 at 18:22

Josh

1,237
4
15
22

votes

1 answer

Remove stop words from data frame

My data is already in a data frame, with one token per line. I'd like to filter out the rows that contain stop words. The dataframe looks like: docID <- c(1,2,2) token <- c('the', 'cat', 'sat') count <- c(10,20,30) df <- data.frame(docID, token,…

r tidyr tidyverse tidytext

asked Nov 16 '17 at 17:54

Adam_G

7,337
20
86
148

votes

1 answer

Web scraping pdf files from HTML

How can I scrape the pdf documents from HTML? I am using R and I can do only extract the text from HTML. The example of the website that I am going to scrape is as…

r text web-scraping tidytext

asked Oct 02 '17 at 10:40

SChatcha

votes

1 answer

Adding word count size as a layer to the node size on a cooccurrence network chart using tidytext

I'm interested in using a similar co-occurrence network chart as what is shown on section 8.2.2 David Robinson and Julia Silge's Tidy Text mining book, such as this chart, except that I would like to have the sizes of the nodes change depending on…

r tidytext ggraph

asked Sep 20 '17 at 22:36

Phil

7,287
3
36
66

votes

1 answer

tf-idf document term matrix and LDA: Error messages in R

Can we input tf-idf document term matrix into Latent Dirichlet Allocation (LDA)? if yes, how? It does not work in my case and the LDA function requires the 'term-frequency' document term matrix. Thank you (I make a question as concise as possible.…

r matrix text-mining lda tidytext

asked Aug 08 '17 at 09:55

Schatchawan

votes

2 answers

Topic Modelling: LDA , word frequency in each topic and Wordcloud

Question: How can I compute and code the frequency of words in each topic? My goal is to create 'Word Cloud' from each topic. P.S.> I have no problem with wordcloud. From the code, burnin <- 4000 #We do not collect this. iter <- 4000 thin…

r text latent-semantic-indexing tidytext latent-semantic-analysis

asked Aug 08 '17 at 08:25

SChatcha

vote

2 answers

how can I unnest phrases between brackets

I have text that I am trying to organizing for some text mining and am using the TidyText library. I have tried setting the token to a regex and setting a custom pattern, but it sends up returning just the bracket (or nothing) and not the content of…

r regex unnest tidytext

asked Mar 24 '23 at 18:09

maijuli

vote

1 answer

Is there a convenient way to deal with "stop phrases" when text mining in R?

I am currently working on a large number of judicial documents. They contain a number of fixed phrases (e.g. Council directive) which due to their frequent occurrence have no meaning for my analysis. Therefore, I would like to remove them. Using a…

r text-mining stop-words tidytext

asked Mar 13 '23 at 18:02

banannanas

vote

1 answer

Wordcloud2 - separate words for counting

am trying to extract the words so that I can create a wordcloud but have some difficulties this is the code: library(readxl) data <- read_excel("C:\\Users\\me\\OneDrive\\Desktop\\ToPandas.xlsx") data2…

r tidyverse word-cloud unnest tidytext

asked Sep 24 '22 at 16:41

crl6904

vote

1 answer

reorder_within reordering facets in nestedfacet ggplot

Help with reordering facets. I am using reorder_within and scale_x_reordered from Julia Silge's blog (https://juliasilge.com/blog/reorder-within/) I am using nested facets here and reordering facets within a parent facet. In this use case the…

r dplyr label facet tidytext

asked Jun 18 '22 at 04:33

Keelin

vote

1 answer

scale_x_reordered does not work in facet_grid

I am a newbie in R and would like to seek your advice regarding visualization using reorder_within, and scale_x_reordered (library: tidytext). I want to show the data (ordered by max to min) by states for each year. This is sample data for…

r facet-wrap facet-grid tidytext

asked Mar 07 '22 at 00:49

Kob

Prev 1 2 3

…

19 20 Next