3

I'm trying to subset a character column a using dplyr::filter(), stringr:: str_detect and the magrittr-pipe using a regular expression capturing the presence of two or more digits.

This only seems to work for a numerical column, and only when accessing the column directly using the $- operator:

library(tidyverse)

# Create example data: 
test_num <- tibble(
  a = c(1:3, 22:24))
test_num
#> # A tibble: 6 x 1
#>       a
#>   <int>
#> 1     1
#> 2     2
#> 3     3
#> 4    22
#> 5    23
#> 6    24

test_char <- tibble(
  a = as.character(c(1:3, 22:24)))
test_char 
#> # A tibble: 6 x 1
#>   a    
#>   <chr>
#> 1 1    
#> 2 2    
#> 3 3    
#> 4 22   
#> 5 23   
#> 6 24

# Subsetting numerical columns works:
test_num %>% 
  dplyr::filter(a, stringr::str_detect(a, "\\d{2,}"))
#> # A tibble: 3 x 1
#>       a
#>   <int>
#> 1    22
#> 2    23
#> 3    24

# Subsetting a character columns does not work:
test_char %>% 
  dplyr::filter(a, stringr::str_detect(a, "\\d{2,}"))
#> Error in filter_impl(.data, quo): Evaluation error: operations are possible only for numeric, logical or complex types.

# Wheras subsetting by accessing the column
# using the `$` operator works:
test_char$a %>% 
  stringr::str_detect("\\d{2,}")
#> [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE

test_num$a %>% 
  stringr::str_detect("\\d{2,}")
#> [1] FALSE FALSE FALSE  TRUE  TRUE  TRUE

Any ideas on what the problem might be and how to solve this using a filter() approach? Thank you so much for your help in advance!

Balthasar
  • 197
  • 2
  • 14

1 Answers1

3

Just take out the first a in your filter call.

Instead of:

test_char %>%
  filter(a, str_detect(a, "2"))

Use:

test_char %>%
  filter(str_detect(a, "2"))

Should work.

The first and only argument in your filter function should be str_detect(col, "string").

Hope that helps!

ah bon
  • 9,293
  • 12
  • 65
  • 148
ForceLeft415
  • 277
  • 4
  • 14
  • Thanks for the answer, but I'm still mystified as to why the `filter()` worked on the numerical column even though technically that should not work? – Balthasar Jan 09 '19 at 23:28
  • 1
    I believe that for numeric vectors (such as test_num$a), `str_detect()` will coerce to a character type for the string detection, despite it not technically being a "string". – ForceLeft415 Jan 11 '19 at 21:59