-3

Applying the unnest_tokens function to tweets creates 1 column with each word in its own row. I have additional columns in the original data frame (day, hour, min) of each tweet. Is there a way to have each row of words also have three more column of the day, hour, and min for each corresponding word? I've tried the following:

tweet_words$text <- tweet_words %>%
  select(text) %>%
  unnest_tokens(word, text)

The original data frame has a text column "tweet_words$text" where every row is 1 tweet. I've tried re-writing the text column to be the column of single words, but get the following error because there's not the same number of rows for the day, hour, min columns. I get the following error.

Error in $<-.data.frame(*tmp*, text, value = list(word = c("same", : replacement has 4571 rows, data has 300

Any ideas how to facilitate the desired outcome?

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • 1
    Please add data using `dput` and show the expected output for the same. Please read the info about [how to ask a good question](http://stackoverflow.com/help/how-to-ask) and how to give a [reproducible example](http://stackoverflow.com/questions/5963269). – Ronak Shah Jul 11 '20 at 01:41
  • Just use `tweet_words <- tweet_words %>% unnest_tokens(word, text)` and if you want to have the full text available as well, use `drop = FALSE` in `unnest_tokens`. – phiver Jul 11 '20 at 09:48

1 Answers1

0

See mutate in dplyr

https://dplyr.tidyverse.org/reference/mutate.html

Better yet, see the intro to dplyr: https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html

Just a guess, but try something like this:

tweet_words<- tweet_words %>%
  select(everything()) %>%
  mutate(text = unnest_tokens(word, text))

Going forward, if really helps to put a sample of the data into your question. Use dput to get code to recreate the data. i.e.

tmp<- tweet_words[1:3,]
dput(tmp) # Copy and paste the output of this into your question. 

I'm just guessing as to what might work without a sample of the data. Nevertheless, the dplyr vignette should get you going.

Happy Coding!

nate
  • 1,172
  • 1
  • 11
  • 26