Join tokens back to sentence

Question

I am doing some text analysis with some free text data with tidytext. Consider a sample sentences:

"The quick brown fox jumps over the lazy dog"
"I love books"

My token approach using tidytext:

unigrams = tweet_text %>% 
  unnest_tokens(output = word, input = txt) %>%
  anti_join(stop_words)

Results in the following:

The
quick
brown
fox
jumps
over 
the
lazy
dog

I now need to join every unigram back to its original sentence:

"The quick brown fox jumps over the lazy dog" | The
"The quick brown fox jumps over the lazy dog" | quick
"The quick brown fox jumps over the lazy dog" | brown
"The quick brown fox jumps over the lazy dog" | fox
"The quick brown fox jumps over the lazy dog" | jumps 
"The quick brown fox jumps over the lazy dog" | over
"The quick brown fox jumps over the lazy dog" | the
"The quick brown fox jumps over the lazy dog" | lazy 
"The quick brown fox jumps over the lazy dog" | dog
"I love books" | I
"I love books" | love
"I love books  | books

I'm a bit stuck. The solution needs to scale for thousands of sentences. I thought some function like this might be native to tidytext, but haven't found anything yet.

Can you make your example with two sentences? Pick a shorter sentence ;) — M--, Feb 23 '20 at 02:57

score 1 · Accepted Answer · answered Feb 23 '20 at 04:06

What you are looking for is the drop = FALSE argument:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(tidytext)

tweet_text <- tibble(id = 1:2,
                     text = c("The quick brown fox jumps over the lazy dog",
                              "I love books"))

tweet_text %>% 
  unnest_tokens(output = word, input = text, drop = FALSE)
#> # A tibble: 12 x 3
#>       id text                                        word 
#>    <int> <chr>                                       <chr>
#>  1     1 The quick brown fox jumps over the lazy dog the  
#>  2     1 The quick brown fox jumps over the lazy dog quick
#>  3     1 The quick brown fox jumps over the lazy dog brown
#>  4     1 The quick brown fox jumps over the lazy dog fox  
#>  5     1 The quick brown fox jumps over the lazy dog jumps
#>  6     1 The quick brown fox jumps over the lazy dog over 
#>  7     1 The quick brown fox jumps over the lazy dog the  
#>  8     1 The quick brown fox jumps over the lazy dog lazy 
#>  9     1 The quick brown fox jumps over the lazy dog dog  
#> 10     2 I love books                                i    
#> 11     2 I love books                                love 
#> 12     2 I love books                                books

^{Created on 2020-02-22 by the reprex package (v0.3.0)}

Join tokens back to sentence

1 Answers1