dplyr filter columns with multiple regex

Question

I have two df in R (meta=some redundant info)

df1:

                id  value1  value2  value3  value4
id1_meta_meta-meta  4.93    13.93   16.8    35.39
id2_meta_meta-meta  28.63   45.43   30.52   61.71
id3_meta_meta-meta  3.35    1.26    7.98    4.43
id4_meta_meta-meta  16.78   50.47   32.48   55.52
id5_meta_meta-meta  474.23  807.71  664.45  442.55
id6_meta_meta-meta  26.26   32.83   24.64   41.58
id7_meta_meta-meta  230.1   202.93  166.71  295.48
id8_meta_meta-meta  651.21  1282.71 1012.28 2650.21

df2:

V1
id1
id2
id3
id4
id5

Question

Trying to filter rows in df1 based on ids in df2

Code

library(dplyr)
library(stringr)
df.common = df1 %>%
  filter(str_detect(id, '*_') %in% df2$V1)

error

Error in filter_impl(.data, quo) : 
  Evaluation error: Syntax error in regexp pattern. (U_REGEX_RULE_SYNTAX).

Desired output

df.common:

                id  value1  value2  value3  value4
id1_meta_meta-meta  4.93    13.93   16.8    35.39
id2_meta_meta-meta  28.63   45.43   30.52   61.71
id3_meta_meta-meta  3.35    1.26    7.98    4.43
id4_meta_meta-meta  16.78   50.47   32.48   55.52
id5_meta_meta-meta  474.23  807.71  664.45  442.55

Your original code will work if you change the `filter` condition to `filter(str_detect(id, df2$V1))` — Jake Kaupp, Aug 17 '17 at 16:24
@JakeKaupp I get this error `Warning message: In stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : longer object length is not a multiple of shorter object length` — sbradbio, Aug 17 '17 at 16:27
It's a warning, not an error, and results in your desired output. — Jake Kaupp, Aug 17 '17 at 16:33
true, rookie mistake apologies but i do not get what I expected `> dim(df.common) [1] 2 13` — sbradbio, Aug 17 '17 at 16:36
`str_detect` detects strings and returns TRUE of FALSE, so your code is looking for TRUE or FALSE in `df2`. Instead, use `str_extract` to pull out the ID part and then test with that: `str_extract(id, "id[0-9]+") %in% df2$V1`. — Gregor Thomas, Aug 17 '17 at 16:46

score 4 · Answer 1 · answered Aug 17 '17 at 16:27

4

If you are using dplyr and stringr, you can also consider this approach. str_replace_all is like gsub. semi_join is a kind of "filter-join" allowing you to keep records only found match in df2.

library(dplyr)
library(stringr)

df3 <- df1 %>%
  mutate(id2 = str_replace_all(id, "_.*", "")) %>%
  semi_join(df2, by = c("id2" = "V1")) %>%
  select(-id2)

df3
                  id value1 value2 value3 value4
1 id1_meta_meta-meta   4.93  13.93  16.80  35.39
2 id2_meta_meta-meta  28.63  45.43  30.52  61.71
3 id3_meta_meta-meta   3.35   1.26   7.98   4.43
4 id4_meta_meta-meta  16.78  50.47  32.48  55.52
5 id5_meta_meta-meta 474.23 807.71 664.45 442.55

answered Aug 17 '17 at 16:27

www

38,575
12
48
84

I will try this, but correct me if I am wrong @PoGibas answer is one liner and concise. – sbradbio Aug 17 '17 at 16:29
Well... if you only want to see the most concise answer, I will delete my answer shortly. If you want to learn more about the use of `dplyr` and `stringr` since you are using these packages, I will keep my answer here as an optional approach. What do you say? – www Aug 17 '17 at 16:32
Sure I have accepted it absolutely your are correct it can be optional way. – sbradbio Aug 17 '17 at 16:35

pogibas · Accepted Answer · 2017-08-17T16:14:45.810

2

Use gsub to trim id in df1
- gsub("_.*", "", df1$id) will remove everything after _
Check what trimmed id's are in df2$V2 (this will return row numbers)

Extract those rows from df1

df1[gsub("_.*", "", df1$id) %in% df2$V2, ]

edited Aug 17 '17 at 16:14

answered Aug 17 '17 at 16:10

pogibas

27,303
19
84
117

It worked, could you comment on whats on going will help to learn, thanks – sbradbio Aug 17 '17 at 16:12
Awesome, appreciate it! – sbradbio Aug 17 '17 at 16:19
@sbradbio if this is what you wanted you can accept my answer then – pogibas Aug 17 '17 at 16:20
I cannot now, SO will let me accept it after 5 mins, dunno why – sbradbio Aug 17 '17 at 16:21

dplyr filter columns with multiple regex

Question

Code

error

Desired output

2 Answers2