1

Say I have a data.frame and I want to subset this data by different filter and get a list of them, so I tried this:

df <- data.frame(A=c(1,2,3), B=c(7,8,9))
filter_lst <- list(c(1,2), c(2,3))
filter_by_range <- function(df, filter) {
  with(df, {df[A >= filter[1] &
               A < filter[2], ]})
}
mapply(filter_by_range, df, filter_lst)

But it shows error:

Error in eval(substitute(expr), data, enclos = parent.frame()) : 
  numeric 'envir' arg not of length one
Called from: with.default(df, {
    df[A >= filter[1] & A < filter[2], ]
})

I guess it's because df is taken as a list for mapply, how can mapply take df as a whole or any other better approaches to do this job?

user3684014
  • 1,175
  • 12
  • 26
  • 1
    Maybe you want something like this? `lapply(filter_lst, function(x) with(df, df[A >= x[1] & A < x[2], ]))`. Alternatively, wrap `df` in `list()`, like so: `apply(filter_by_range, list(df), filter_lst)`. – jbaums Oct 21 '14 at 23:03
  • Yeah, as it is you're basically saying you want ```df$A >= list(c(1,2)) & df$A < list(c(2,3))``` instead of specifying a range from one number to another. jbaums' suggestions look good to me. – rsoren Oct 21 '14 at 23:09
  • If `df` is a "constant" then mapply provides a 'MoreArgs' method of supplying an argument. You just need to get the naming set up correctly so that the function will bring it in properly. – IRTFM Oct 21 '14 at 23:30
  • (Sorry my comment above is missing the `m` in `mapply`... should have read: `mapply(filter_by_range, list(df), filter_lst)`. – jbaums Oct 23 '14 at 15:51

1 Answers1

4

As described in comments above:

 mapply(filter_by_range, filter=filter_lst, MoreArgs=list(df))
  [,1] [,2]
A 1    2   
B 7    8   

Notice that it was returned as a matrix because the default is SIMPLIFY=TRUE. If you wanted it as a list:

> mapply(filter_by_range, filter=filter_lst, MoreArgs=list(df), SIMPLIFY=FALSE)
[[1]]
  A B
1 1 7

[[2]]
  A B
2 2 8
IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Or just `mapply(filter_by_range, list(df), filter_lst)` - `MoreArgs` seems unnecessary in this case. – jbaums Oct 23 '14 at 15:51
  • In this toy case perhaps, but it would be necessary if the example were more complex, as was probably the case in the actual application. The `length` (i.e. number of columns) of 'df' argument was accidentally the same as the 'filter_lst' but would not necessarily be so. – IRTFM Oct 23 '14 at 16:11
  • I just tested with `df <- data.frame(A=1:3, B=7:9, C=10:12)`, and both return the same result (although I did need to change your `MoreArgs=list(df)` to `MoreArgs=list(df=df)`). Anyway - no biggie, was just wondering where `MoreArgs` would be necessary. :) – jbaums Oct 23 '14 at 16:14
  • 1
    I belatedly see that the arguments are expected to be recycled to the length of the longest object so your simplification is correct. I still get success with list(df) as long as I label the 'filter' arg correctly. I suppose MoreArgs would be needed to pass in a more complex argument such as a pair of model fits that you didn't want to be recycled. – IRTFM Oct 23 '14 at 16:25
  • Thanks for the explanation - makes sense. And sorry, I didn't name `filter` when testing your code... I can see that you wouldn't need to name `df` in that case! – jbaums Oct 23 '14 at 16:26