Questions tagged [tidytext]

The tidytext package provides tools for text mining using tidy data principles in R.

The R tidytext package, developed by Julia Silge and David Robinson, provides functions and supporting data sets to allow conversion of text to and from tidy formats, and to switch seamlessly between tidy tools and existing text mining packages. When text is in a tidy data structure, tools from the R tidyverse ecosystem like can be used for effective data handling and analysis.

Repositories

Vignettes

Other resources

Related tags

294 questions
1
vote
1 answer

Sort an element in a document in tidytext

As you can see the legend on the right hand side, I need to reorder it as 1,2,3,...64, not 1,10,11...,8. My term-document matrix is as follows. Please give me some ideas how to rearrange the code. A tibble: 4,530 x 5 document term…
SChatcha
  • 129
  • 1
  • 3
  • 10
1
vote
1 answer

Tidy data frame: German characters being removed

I am using the following code to convert a data frame to a tidy data frame: replace_reg <- "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&|<|>|RT|https" unnest_reg <- "([^A-Za-z_\\d#@']|'(?![A-Za-z_\\d#@]))" tidy_tweets <- tweets %>%…
mundos
  • 459
  • 6
  • 14
1
vote
1 answer

unnest_tokens and its error("")

I am working with tidytext. When I command unnest_tokens. R returns the error Please supply column name How can I solve this…
SChatcha
  • 129
  • 1
  • 3
  • 10
1
vote
0 answers

Can you install packages in R without imports or dependencies?

I work on a computer that doesn't have internet access. I download all of my R packages and install them from .zip files. One issue, however, is that when I install a package, it will require other packages because I load them into the library. …
Alex
  • 77
  • 1
  • 10
1
vote
2 answers

Finding Abbreviations in Data with R

In my data (which is text), there are abbreviations. Is there any functions or code that search for abbreviations in text? For example, detecting 3-4-5 capital letter abbreviations and letting me count how often they happen. Much appreciated!
Alex
  • 77
  • 1
  • 10
1
vote
1 answer

How to extract month from column

I'd like to create a plot from the Textmining with R web textbook, but with my data. It essentially searches for the top terms per year and graphs them (Figure 5.4: http://tidytextmining.com/dtm.html). My data is a bit cleaner than the one they…
Alex
  • 77
  • 1
  • 10
1
vote
3 answers

Error in installIing packages tidytext - R

I tried to install package tidytext but got the following error: install.packages("tidytext") Installing package into ‘\\dcn4pfsh404/home_8/TUT/Documents/R/win-library/3.3’ (as ‘lib’ is unspecified) trying URL…
baver
  • 23
  • 1
  • 6
1
vote
2 answers

Reading documents with r-tm to use with r-mallet

I have this code to fit a topic model with the R wrapper for MALLET: docs <- mallet.import(DF$document, DF$text, stop_words) mallet_model <- MalletLDA(num.topics = 4) mallet_model$loadDocuments(docs) mallet_model$train(100) I have used the tm…
Simon Lindgren
  • 2,011
  • 12
  • 32
  • 46
1
vote
2 answers

Simple section labeling with tidytext for plain text input

I'm using tidytext (and the tidyverse) to analyze some text data (as in Tidy Text Mining with R). My input text file, myfile.txt, looks like this: # Section 1 Name Lorem ipsum dolor sit amet ... (et cetera) # Section 2 Name
weinerjm
  • 23
  • 3
1
vote
1 answer

Using the Nested List Column Approach and Purrr Together with Tidytext::Unnest_Tokens

I have a dataframe that contains survey responses with each row representing a different person. One column - "Text" - is an open-ended text question. I would like to use Tidytext::unnest_tokens so that I do text analysis by each row, including…
Mike
  • 2,017
  • 6
  • 26
  • 53
1
vote
0 answers

Text mining .docx interview transcriptions in R

I have a number of interview transcriptions that I am hoping to run text mining analyses on. Basically trying to automate the qualitative coding procedure. I've been reading up on tidytext text mining, but it only seems to use already imported…
Gerard
  • 159
  • 1
  • 2
  • 11
1
vote
2 answers

tidytext example filter error with pipes

When trying to reproduce the example found in http://tidytextmining.com/twitter.html there's a problem. Basically I want to adapt this part of the code library(tidytext) library(stringr) reg <-…
Oki
  • 13
  • 4
0
votes
4 answers

How to remove a word from a dataset in R? NLP

I'm very new in this world of programming. Ok so I am making an analysis of a text in R. I am using this to get rid of stop words: kant_palavras <- kant_palavras %>% anti_join(get_stopwords(language = 'pt')) BUT after, in the counting of words, the…
0
votes
1 answer

Text analysis in R with multi-word and TF-IDF

I am quite new at R and I am trying to run a text analysis and TF-IDF in a bunch of reports considering a specific set of words in a dictionary I built. The code below has provided the results for that, however, it has failed to consider…
Pablo
  • 1
  • 1
0
votes
1 answer

Error within tidy text. unnest_tokens was deprecated

unnest_tokens() was deprecated in tidytext 0.4.0 and is now defunct Attempting to create a 2-node graph through twitter data within R and am receiving the following error message. twomode_network <- twitter_data %>% Create("twomode",…
TheDud
  • 1
  • 1