tidytext example filter error with pipes

Question

When trying to reproduce the example found in http://tidytextmining.com/twitter.html there's a problem.

Basically I want to adapt this part of the code

library(tidytext)
library(stringr)

reg <- "([^A-Za-z_\\d#@']|'(?![A-Za-z_\\d#@]))"

tidy_tweets <- tweets %>% 
    mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&amp;|&lt;|&gt;|RT", "")) %>%
    unnest_tokens(word, text, token = "regex", pattern = reg) %>%
    filter(!word %in% stop_words$word,
        str_detect(word, "[a-z]"))

in order to keep the stop_Word included dataframe of tweets.

So i tried this :

tidy_tweets <- tweets %>% 
    mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&amp;|&lt;|&gt;|RT", "")) %>%
    unnest_tokens(word, text, token = "regex", pattern = reg) 

tidy_tweets_sw <- filter(!word %in% stop_words$word, str_detect(tidy_tweets, "[a-z]"))

But that did not work as i got the following error message :

Error in match(x, table, nomatch = 0L) :  
'match' requires vector arguments

I have tried to pass a vector version of both inputs to match, but to no avail. Does anyone have a better idea?

tidytext usually uses `anti_join(stop_words)` in the vignettes. — alistaire, Nov 16 '16 at 15:59

Jake Kaupp · Answer 1 · 2016-11-16T16:00:49.397

1

You need to have the data in your filter statement as your first argument.

tidy_tweets <- tweets %>% 
  mutate(text = str_replace_all(text, "https://t.co/[A-Za-z\\d]+|http://[A-Za-z\\d]+|&amp;|&lt;|&gt;|RT", "")) %>%
  unnest_tokens(word, text, token = "regex", pattern = reg) 

tidy_tweets_sw <- filter(tidy_tweets, !(word %in% stop_words$word), str_detect(tidy_tweets, "[a-z]"))

edited Nov 16 '16 at 16:00

answered Nov 16 '16 at 15:46

Jake Kaupp

7,892
2
26
36

Tensibai · Accepted Answer · 2016-11-16T16:26:41.600

1

Unsure but I think your problem is here:

tidy_tweets_sw <- filter(!word %in% stop_words$word, str_detect(tidy_tweets, "[a-z]"))

filter has no clue about what you want to filter at all, this should work:

tidy_tweets_sw <- tidy_tweets %>% filter(!word %in% stop_words$word, str_detect(tidy_tweets, "[a-z]"))

edited Nov 16 '16 at 16:26

answered Nov 16 '16 at 15:52

Tensibai

15,557
1
37
57

Perfect! Thanks a lot ("I should have known")! – Oki Nov 16 '16 at 16:06
Well, that's the problem with piping I think, it's easy to forget the most left arg is the first to any function on the right :) – Tensibai Nov 16 '16 at 16:07
1

The `tweets` should be changed to `tidy_tweets` to reflect Oki's intermediate step – Jake Kaupp Nov 16 '16 at 16:20
Good catch @JakeKaupp didn't think about it, I was focused on the missing pipe :). – Tensibai Nov 16 '16 at 16:26

tidytext example filter error with pipes

2 Answers2