Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
0
votes
1 answer

tidytext unnest_token default token argument the only one that works

New to tidytext and running into an error. When I try to pass anything other than "words" into the token argument for the unnest_tokens function I get: Error in eval(substitute(expr), envir, enclos) : object 'txt' not found Cant even run the…
Thomas
  • 1
0
votes
2 answers

Sentiment analysis (AFINN) in R

I am trying to the sentiment of a dataset of Tweets using the AFINN dictionary (get_sentiments("afinn"). A sample of the dataset is provided below: A tibble: 10 x 2 Date TweetText …
Tim
  • 1
  • 1
  • 3
0
votes
1 answer

R: Cast data frame of document term counts into a document term matrix (dtm)

I already have a data frame at the document term count level, noting that documents and terms are simply indexed by integers, and scores are weighted continuous numbers, if that is relevant, e.g.: doc term count 1 2 2 1 5 3.1 2 2 …
km5041
  • 351
  • 1
  • 4
  • 13
0
votes
0 answers

%in% is returning FALSE when I know it's TRUE

Relevant files: biggie positive I'm working on some natural language processing and am trying to check if a word in one list is in another using the %in% check. Problem is, it returns everything as FALSE when I know there should be at least a few…
0
votes
0 answers

Deal with phrasal verb in text mining

Phrasal verb is really important in day-to-day English usage. Is there any library in R that allows us to deal with it? I have tried 2 ways but it seems unable to deal with it For example library(sentimentr) library(tidytext) library(tidyverse) x…
ducvu169
  • 103
  • 1
  • 12
0
votes
2 answers

Counting Number of Rows in R data.frame and Storing as Additional Variable

I have a data frame that returns two column variables - word1 and word2 like this: head(bigrams_filtered2, 20) # A tibble: 20 x 2 word1 word2 1 practice risk 2 risk management 3…
Davide Lorino
  • 875
  • 1
  • 9
  • 27
0
votes
0 answers

Text mining frequency with ggplot

I am working with a dataset called HappyDB for a class presentation and analyzing demographic differences in word frequency. I'm using tidytext for most of the analyses, and using their online guide to create most of my visuals. However, I'm running…
SRobProsc
  • 23
  • 1
  • 6
0
votes
1 answer

Extracting Elements from text files in R

I am trying to get into text analysis in R. I have a text file with the following structure. HD A YEAR Oxxxx WC 244 words PD 28 February 2018 SN XYZ SC hydt LA English CY Copyright 2018 LP Rio de Janeiro, Feb 28 TD With recreational…
Beginner
  • 262
  • 1
  • 4
  • 12
0
votes
2 answers

Wordcloud titles not showing/rendering in R

So I performed a sentiment analysis using tidy principles. I would like to plot the results in a comparison cloud (positive VS negative sentiments). This is my code: library(reshape2) library(tidytext) dtm_tidy %>% filter() dtm_tidy…
Lucinho91
  • 175
  • 2
  • 4
  • 16
0
votes
1 answer

How to load texts for text mining with R Tidytext?

How do I load a folder of .txt files for textmining with Tidytext? I came across Silge & Robinson "Text mining with R: A tidy approach" (https://www.tidytextmining.com/) and it seems very promising for my purposes. But I'm very new to R (trying to…
Akfak
  • 1
  • 2
0
votes
0 answers

Tidytext: converting the frequency of words to the percentage

I'd like to convert the frequency of words to the percentage of words. This in my code: text %>% inner_join(get_sentiments("bing")) %>% group_by(index = file, file, sentiment) %>% summarize(n = n()) %>% ggplot(aes(x = index, y = n, fill = file)) +…
Andreja
  • 1
  • 1
0
votes
1 answer

Opposite of unnest_tokens after creating dummy variable

library(NLP) library(tm) library(tidytext) library(tidyverse) library(topicmodels) library(dplyr) library(stringr) library(purrr) library(tidyr) #sample dataset tags <- c("product, productdesign, electronicdevice") web <- c("hardware, sunglasses,…
Kreitz Gigs
  • 369
  • 1
  • 9
0
votes
0 answers

R Regular expression to search citations of law using tidytext and tm

I use tidytext, tm and quantedafor text mining. I try to: filter a tibble with plain, processed text according to presence of a citation of law count the number of the same citation per text document Unfortunately, I am weak at using specific…
captcoma
  • 1,768
  • 13
  • 29
0
votes
2 answers

R unnest_tokens and calculate positions (start and end location) of each token

How to get the position of all the tokens after using unnest_tokens? Here is a simple example - df<-data.frame(id=1, doc=c("Patient: [** Name **], [** Name **] Acct.#: [** Medical_Record_Number **] MR #: [**…
x1carbon
  • 287
  • 1
  • 15
0
votes
1 answer

With text analysis inner_join removes more than a thousand words in R

I'm analysing a column with words in my most_used_words dataframe. With 2180 words. most_used_words word times_used 1 people 70 2 news 69 3 fake 68 4 country 54 5 …
Tdebeus
  • 1,519
  • 5
  • 21
  • 43