Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
0
votes
1 answer

Delete rows with blank values after performing unnest_tokens and remove stopwords?

Here is my df: df <- structure(list(id = 1:50, strain_id = c(6L, 6L, 7L, 12L, 19L, 35L, 81L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 100L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 123L, 202L, 202L, 202L, 202L,…
SteveS
  • 3,789
  • 5
  • 30
  • 64
0
votes
1 answer

Remove rows with character(0) from a data.frame before proceeding to dtm

I'm analyzing a data frame of product reviews that contain some empty entries or text written in foreign language. The data also contain some customer attributes which can be used as "features" in later analysis. To begin with, I will first convert…
Chris T.
  • 1,699
  • 7
  • 23
  • 45
0
votes
4 answers

sentiments dataset in R throwing error with AFINN lexicon

Trying to access the sentiments data set for the "AFINN" lexicon using the function get_sentiments("afinn") R code : library(textdata) get_sentiments("afinn") Throwing below error message Do you want to download: Name: AFINN-111 Error in…
sam
  • 85
  • 3
  • 10
0
votes
1 answer

How to fix "no package called textdata" error?

I am trying to run sentiment analysis in R. I have installed tidytext and it is in the correct library with all other packages. However, when I run get_sentiments("afinn") I get the following error: Error in loadNamespace(name) : there is no…
user10643490
0
votes
1 answer

Manually inserting topic-specific stopwords

I'm using tidytext's built-in anti_join(get_stopwords()) command to clean documents from a data of customer review of tech products, but I found out the output corpus consists primarily of tech specification (e.g., Windows 10, 720p Camera, 380.6 x…
Chris T.
  • 1,699
  • 7
  • 23
  • 45
0
votes
1 answer

Error when importing csv data into R for text mining

I keep getting this error when trying to import a csv document into R and trying to develop a corpus for topic modeling. I have used this approach successfully on 4 other projects but cannot get past this error. My data source has a doc_id column…
0
votes
1 answer

Combine tidy text with synonyms to create dataframe

I have sample data frame as below: quoteiD <- c("q1","q2","q3","q4", "q5") quote <- c("Unthinking respect for authority is the greatest enemy of truth.", "In the middle of difficulty lies opportunity.", "Intelligence is the ability to…
R noob
  • 495
  • 3
  • 20
0
votes
1 answer

Reading text files into tidytext and adding metadata

I have several thousand .txt files in a directory and would like to read them all into tidytext where I would then add columns of metadata. The filenames themselves contain all of the metadata and I have been successful in using substr to parse the…
AlanS
  • 13
  • 3
0
votes
0 answers

How to encode text correctly when importing word documents into R?

I am trying to import content of multiple word documents into the same object in R. I am following Julia Silge and David Robinson's guide (see here: https://www.tidytextmining.com/usenet.html). I am unable to figure out how to encode "text" column…
Anavir
  • 33
  • 7
0
votes
1 answer

determine the temporality of a sentence with POS tagging

I want to find out whether an action has been carried out if will be carried out from a series of sentences. For example: "I will prescribe this medication" versus "I prescribed this medication" or "He had already taken the stuff" versus "he may…
Sebastian Zeki
  • 6,690
  • 11
  • 60
  • 125
0
votes
0 answers

R Widyr Package (Correlation values NaN)

I am working in analyzing the pairwise correlation of words appearing in user reviews and plotting them in the form of the correlation network graph. My sample data is as follows: review_corwords Label Rating word 1 …
IronMaiden
  • 552
  • 4
  • 20
0
votes
1 answer

Extract text based on character position returned from gregexpr

I'm working in R, trying to prepare text documents for analysis. Each document is stored in a column (aptly named, "document") of dataframe called "metaDataFrame." The documents are strings containing articles and their BibTex citation info. Data…
0
votes
3 answers

Apply Math calculation to all rows of DF by Column Values

I want to apply a math calculation which is (Occ_1+1)/(Totl_1+Unique_words) , (Occ_2+1)/(Totl_2+Unique_words) and (Occ_3+1)/(Totl_3+Unique_words) and create a new column as Probability_1, Probability_2, Probability_3 Right now i am doing every…
james joyce
  • 483
  • 7
  • 24
0
votes
2 answers

Count the Occurence of word,Total words and total Unique words in R

I have a huge df which has a doc_id and word, and every word can contain multiple class(Class_1,Class_2,Class_3 ) so if a word is in that class i put 1 there or if not then 0 SAMPLE DF doc_id word Class_1 Class_2 Class_3 104 saturn…
james joyce
  • 483
  • 7
  • 24
0
votes
2 answers

Error: No tidy method for objects of class LDA_VEM§

I am literally following the steps as presented in chapter 6 of the "Text Mining in R: a Tidy Approach" book. See: https://www.tidytextmining.com/topicmodeling.html #import libraries library(topicmodels) library(tidytext) #access…
Vasino
  • 3
  • 1
  • 2