How to unnest_tokens and keep additional columns

Question

Applying the unnest_tokens function to tweets creates 1 column with each word in its own row. I have additional columns in the original data frame (day, hour, min) of each tweet. Is there a way to have each row of words also have three more column of the day, hour, and min for each corresponding word? I've tried the following:

tweet_words$text <- tweet_words %>%
  select(text) %>%
  unnest_tokens(word, text)

The original data frame has a text column "tweet_words$text" where every row is 1 tweet. I've tried re-writing the text column to be the column of single words, but get the following error because there's not the same number of rows for the day, hour, min columns. I get the following error.

Error in $<-.data.frame(*tmp*, text, value = list(word = c("same", : replacement has 4571 rows, data has 300

Any ideas how to facilitate the desired outcome?

Please add data using `dput` and show the expected output for the same. Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). — Ronak Shah, Jul 11 '20 at 01:41
Just use `tweet_words <- tweet_words %>% unnest_tokens(word, text)` and if you want to have the full text available as well, use `drop = FALSE` in `unnest_tokens`. — phiver, Jul 11 '20 at 09:48

score 0 · Answer 1 · answered Jul 10 '20 at 23:12

See mutate in dplyr

https://dplyr.tidyverse.org/reference/mutate.html

Better yet, see the intro to dplyr: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html

Just a guess, but try something like this:

tweet_words<- tweet_words %>%
  select(everything()) %>%
  mutate(text = unnest_tokens(word, text))

Going forward, if really helps to put a sample of the data into your question. Use dput to get code to recreate the data. i.e.

tmp<- tweet_words[1:3,]
dput(tmp) # Copy and paste the output of this into your question.

I'm just guessing as to what might work without a sample of the data. Nevertheless, the dplyr vignette should get you going.

Happy Coding!

How to unnest_tokens and keep additional columns

1 Answers1