Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

votes

1 answer

tidytext unnest_token default token argument the only one that works

New to tidytext and running into an error. When I try to pass anything other than "words" into the token argument for the unnest_tokens function I get: Error in eval(substitute(expr), envir, enclos) : object 'txt' not found Cant even run the…

r tidytext

asked May 08 '18 at 22:35

Thomas

votes

2 answers

Sentiment analysis (AFINN) in R

I am trying to the sentiment of a dataset of Tweets using the AFINN dictionary (get_sentiments("afinn"). A sample of the dataset is provided below: A tibble: 10 x 2 Date TweetText …

r tidyverse sentiment-analysis tidytext lexicon

asked May 06 '18 at 14:28

Tim

votes

1 answer

R: Cast data frame of document term counts into a document term matrix (dtm)

I already have a data frame at the document term count level, noting that documents and terms are simply indexed by integers, and scores are weighted continuous numbers, if that is relevant, e.g.: doc term count 1 2 2 1 5 3.1 2 2 …

r matrix tm tidytext

asked Apr 25 '18 at 23:05

km5041

votes

0 answers

%in% is returning FALSE when I know it's TRUE

Relevant files: biggie positive I'm working on some natural language processing and am trying to check if a word in one list is in another using the %in% check. Problem is, it returns everything as FALSE when I know there should be at least a few…

r nlp tidytext

asked Apr 24 '18 at 15:24

lostpineapple45

votes

0 answers

Deal with phrasal verb in text mining

Phrasal verb is really important in day-to-day English usage. Is there any library in R that allows us to deal with it? I have tried 2 ways but it seems unable to deal with it For example library(sentimentr) library(tidytext) library(tidyverse) x…

r text-mining tidytext

asked Apr 23 '18 at 02:29

ducvu169

votes

2 answers

Counting Number of Rows in R data.frame and Storing as Additional Variable

I have a data frame that returns two column variables - word1 and word2 like this: head(bigrams_filtered2, 20) # A tibble: 20 x 2 word1 word2 1 practice risk 2 risk management 3…

r dplyr text-mining tidytext

asked Apr 20 '18 at 02:22

Davide Lorino

votes

0 answers

Text mining frequency with ggplot

I am working with a dataset called HappyDB for a class presentation and analyzing demographic differences in word frequency. I'm using tidytext for most of the analyses, and using their online guide to create most of my visuals. However, I'm running…

r tidytext

asked Apr 16 '18 at 16:33

SRobProsc

votes

1 answer

Extracting Elements from text files in R

I am trying to get into text analysis in R. I have a text file with the following structure. HD A YEAR Oxxxx WC 244 words PD 28 February 2018 SN XYZ SC hydt LA English CY Copyright 2018 LP Rio de Janeiro, Feb 28 TD With recreational…

r tidyr tidyverse stringr tidytext

asked Apr 04 '18 at 10:23

Beginner

votes

2 answers

Wordcloud titles not showing/rendering in R

So I performed a sentiment analysis using tidy principles. I would like to plot the results in a comparison cloud (positive VS negative sentiments). This is my code: library(reshape2) library(tidytext) dtm_tidy %>% filter() dtm_tidy…

r text-mining sentiment-analysis word-cloud tidytext

asked Mar 19 '18 at 11:29

Lucinho91

votes

1 answer

How to load texts for text mining with R Tidytext?

How do I load a folder of .txt files for textmining with Tidytext? I came across Silge & Robinson "Text mining with R: A tidy approach" (https://www.tidytextmining.com/) and it seems very promising for my purposes. But I'm very new to R (trying to…

r loading text-mining tidytext

asked Mar 02 '18 at 19:37

Akfak

votes

0 answers

Tidytext: converting the frequency of words to the percentage

I'd like to convert the frequency of words to the percentage of words. This in my code: text %>% inner_join(get_sentiments("bing")) %>% group_by(index = file, file, sentiment) %>% summarize(n = n()) %>% ggplot(aes(x = index, y = n, fill = file)) +…

r ggplot2 sentiment-analysis tidytext

asked Mar 01 '18 at 08:46

Andreja

votes

1 answer

Opposite of unnest_tokens after creating dummy variable

library(NLP) library(tm) library(tidytext) library(tidyverse) library(topicmodels) library(dplyr) library(stringr) library(purrr) library(tidyr) #sample dataset tags <- c("product, productdesign, electronicdevice") web <- c("hardware, sunglasses,…

r tidytext

asked Feb 20 '18 at 18:27

Kreitz Gigs

votes

0 answers

R Regular expression to search citations of law using tidytext and tm

I use tidytext, tm and quantedafor text mining. I try to: filter a tibble with plain, processed text according to presence of a citation of law count the number of the same citation per text document Unfortunately, I am weak at using specific…

r regex tm quanteda tidytext

asked Jan 13 '18 at 20:10

captcoma

1,768
13
29

votes

2 answers

R unnest_tokens and calculate positions (start and end location) of each token

How to get the position of all the tokens after using unnest_tokens? Here is a simple example - df<-data.frame(id=1, doc=c("Patient: [** Name **], [** Name **] Acct.#: [** Medical_Record_Number **] MR #: [**…

r string nlp emr tidytext

asked Jan 05 '18 at 18:35

x1carbon

votes

1 answer

With text analysis inner_join removes more than a thousand words in R

I'm analysing a column with words in my most_used_words dataframe. With 2180 words. most_used_words word times_used 1 people 70 2 news 69 3 fake 68 4 country 54 5 …

r tidyverse text-analysis tidytext lexicon

asked Dec 09 '17 at 15:06

Tdebeus

1,519
5
21
43

Prev 1 2 3

…

19 20 Next