Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like dplyr can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Text Mining with R: A Tidy Approach

Related tags

R's tm, quanteda, dplyr, tidyr, and broom packages

294 questions

vote

1 answer

Sort an element in a document in tidytext

As you can see the legend on the right hand side, I need to reorder it as 1,2,3,...64, not 1,10,11...,8. My term-document matrix is as follows. Please give me some ideas how to rearrange the code. A tibble: 4,530 x 5 document term…

r ggplot2 rstudio tidytext

asked Aug 04 '17 at 11:12

SChatcha

vote

1 answer

Tidy data frame: German characters being removed

I am using the following code to convert a data frame to a tidy data frame: replace_reg <- "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT|https" unnest_reg <- "([^A-Za-z_\\d#@']|'(?![A-Za-z_\\d#@]))" tidy_tweets <- tweets %>%…

r regex tidyverse tidytext

asked Jul 25 '17 at 13:24

mundos

vote

1 answer

unnest_tokens and its error("")

I am working with tidytext. When I command unnest_tokens. R returns the error Please supply column name How can I solve this…

r rstudio unnest tidytext

asked Jul 20 '17 at 16:37

SChatcha

vote

0 answers

Can you install packages in R without imports or dependencies?

I work on a computer that doesn't have internet access. I download all of my R packages and install them from .zip files. One issue, however, is that when I install a package, it will require other packages because I load them into the library. …

r qdap tidytext

asked Jun 14 '17 at 21:05

Alex

vote

2 answers

Finding Abbreviations in Data with R

In my data (which is text), there are abbreviations. Is there any functions or code that search for abbreviations in text? For example, detecting 3-4-5 capital letter abbreviations and letting me count how often they happen. Much appreciated!

r regex tidyr stringr tidytext

asked Jun 13 '17 at 18:20

Alex

vote

1 answer

How to extract month from column

I'd like to create a plot from the Textmining with R web textbook, but with my data. It essentially searches for the top terms per year and graphs them (Figure 5.4: http://tidytextmining.com/dtm.html). My data is a bit cleaner than the one they…

r ggplot2 tidyr tidytext

asked Jun 13 '17 at 15:50

Alex

vote

3 answers

Error in installIing packages tidytext - R

I tried to install package tidytext but got the following error: install.packages("tidytext") Installing package into ‘\\dcn4pfsh404/home_8/TUT/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) trying URL…

r tidytext

asked Jun 02 '17 at 14:31

baver

vote

2 answers

Reading documents with r-tm to use with r-mallet

I have this code to fit a topic model with the R wrapper for MALLET: docs <- mallet.import(DF$document, DF$text, stop_words) mallet_model <- MalletLDA(num.topics = 4) mallet_model$loadDocuments(docs) mallet_model$train(100) I have used the tm…

tm mallet tidytext

asked Apr 22 '17 at 20:33

Simon Lindgren

2,011
12
32
46

vote

2 answers

Simple section labeling with tidytext for plain text input

I'm using tidytext (and the tidyverse) to analyze some text data (as in Tidy Text Mining with R). My input text file, myfile.txt, looks like this: # Section 1 Name Lorem ipsum dolor sit amet ... (et cetera) # Section 2 Name

r tidyverse tidytext

asked Feb 23 '17 at 21:11

weinerjm

vote

1 answer

Using the Nested List Column Approach and Purrr Together with Tidytext::Unnest_Tokens

I have a dataframe that contains survey responses with each row representing a different person. One column - "Text" - is an open-ended text question. I would like to use Tidytext::unnest_tokens so that I do text analysis by each row, including…

r dplyr tidyr purrr tidytext

asked Feb 13 '17 at 04:17

Mike

2,017
6
26
53

vote

0 answers

Text mining .docx interview transcriptions in R

I have a number of interview transcriptions that I am hoping to run text mining analyses on. Basically trying to automate the qualitative coding procedure. I've been reading up on tidytext text mining, but it only seems to use already imported…

r text-mining .doc transcription tidytext

asked Feb 07 '17 at 09:41

Gerard

vote

2 answers

tidytext example filter error with pipes

When trying to reproduce the example found in http://tidytextmining.com/twitter.html there's a problem. Basically I want to adapt this part of the code library(tidytext) library(stringr) reg <-…

r dplyr stringr tidytext

asked Nov 16 '16 at 15:08

Oki

votes

4 answers

How to remove a word from a dataset in R? NLP

I'm very new in this world of programming. Ok so I am making an analysis of a text in R. I am using this to get rid of stop words: kant_palavras <- kant_palavras %>% anti_join(get_stopwords(language = 'pt')) BUT after, in the counting of words, the…

r nlp tidytext anti-join

asked Aug 31 '23 at 21:58

philosophy

votes

1 answer

Text analysis in R with multi-word and TF-IDF

I am quite new at R and I am trying to run a text analysis and TF-IDF in a bunch of reports considering a specific set of words in a dictionary I built. The code below has provided the results for that, however, it has failed to consider…

r nlp tf-idf tidytext

asked Aug 14 '23 at 16:07

Pablo

votes

1 answer

Error within tidy text. unnest_tokens was deprecated

unnest_tokens() was deprecated in tidytext 0.4.0 and is now defunct Attempting to create a 2-node graph through twitter data within R and am receiving the following error message. twomode_network <- twitter_data %>% Create("twomode",…

r tidytext

asked Jun 01 '23 at 08:01

TheDud

Prev 1 2 3

…

19 20 Next