2

I'm trying to create a filter to remove lines from a dataset using grep and subset together.

Sample dataset:

id <- 1:10
problem <- c("a" , "b", "c", "d", "a","b","c","a", "b", "a")
solution1 <- c("eat", "sleep", "drink", "play", "sleep", "play", "play", "drink", "play", "eat")
solution2 <- c("read", "read", "eat", "drink", "eat", "sleep", "eat", "read", "eat", "play")
df <- c(id, problem, solution1, solution2)

I'm trying to remove those rows with problem "a" and have "eat" in either solution1 or solution2.

The result is that it should remove id 1, 5 and 10.

I have tried using:

df <- subset(df, problem=="a" & !(grepl("eat", df)))

and

df <- df[!grepl("eat", df) & grepl("a", df$problem)]

Can't seem to find a similar solution on StackOverflow or on other websites I Googled.

Would appreciate if anyone can help. Thanks!

oguz ismail
  • 1
  • 16
  • 47
  • 69
Andrew Fang
  • 23
  • 1
  • 3

2 Answers2

5

First, if you want a dataframe, you should use data.frame, not c:

df <- data.frame(id, problem, solution1, solution2)

Then you can subset like this for instance (no need to use subset per se)

df2 <- df[!(grepl("a", df$problem) & 
           (grepl("eat", df$solution1) |
            grepl("eat", solution2))),]

#   id problem solution1 solution2
# 2  2       b     sleep      read
# 3  3       c     drink       eat
# 4  4       d      play     drink
# 6  6       b      play     sleep
# 7  7       c      play       eat
# 8  8       a     drink      read
# 9  9       b      play       eat
Dominic Comtois
  • 10,230
  • 1
  • 39
  • 61
  • ok thanks. will do.. first time using this. but same question to the other poster, the solution works well on this sample data, but when i apply it to my read dataset, your solution gives me this result: Error: unexpected ',' in: " (grepl("Digoxin", data$Med_7_Name | grepl("Digoxin", data$Med_8_Name)))," The Digoxin is supposed to be like the eat and Med_7_Name is the solution1. Is there anything i should be aware of? – Andrew Fang Mar 12 '15 at 08:05
  • A closing parenthesis is missing after data$Med_7_Name – Dominic Comtois Mar 12 '15 at 08:15
  • ok. closed it and i got the same result as for A Val's solution. the same number of rows as the initial df remain. i guess something went wrong as i translated this to the sample data for this site. thanks anyway... adopted your answer since you posted first. – Andrew Fang Mar 12 '15 at 08:25
  • You might need to play a little with the grepl function if you don't get the results you intend. You can also post your unsuccessful attemps as an edit to your original post, no problem there! – Dominic Comtois Mar 12 '15 at 08:38
0

I'd do this:

df <- df[!(df$problem %in% "a" & (df$solution1 %in% "eat" | df$solution2 %in% "eat")),]

#   id problem solution1 solution2
# 2  2       b     sleep      read
# 3  3       c     drink       eat
# 4  4       d      play     drink
# 6  6       b      play     sleep
# 7  7       c      play       eat
# 8  8       a     drink      read
# 9  9       b      play       eat

regex is not really necessary here if you compare exact strings. Using %in% for subsetting will save you alot of time, because it compares vectors. e.g. instead of "a" there could have been c("a", "b", "c") etc.

statespace
  • 1,644
  • 17
  • 25
  • thanks! your solution works well too. however when i try applying both your solution to my real dataset, the row that are meant to be filtered don't get removed. i still get a df with the same number of rows. is there anything that i'm not doing correctly or should be aware of? – Andrew Fang Mar 12 '15 at 08:04
  • Most likely problem is within name of data frame or column names, make sure you adapt the code to your dataset accordingly. And.. brackets. They are often source of typos. – statespace Mar 12 '15 at 08:07
  • thanks. i double checked the names and even used column number. the variable is "character" too. stuck! anyway, thanks for your help! – Andrew Fang Mar 12 '15 at 08:10
  • Well, I can't solve problems that I can't see. This is how StackExchange works. You provide problem and example, we provide solution to your example. Both mine and Dominics answers solve the problem... – statespace Mar 12 '15 at 08:16
  • totally agree with you. ok thanks for your help anyway! – Andrew Fang Mar 12 '15 at 08:22