30

I'm wondering how to use the subset function if I don't know the name of the column I want to test. The scenario is this: I have a Shiny app where the user can pick a variable on which to filter (subset) the data table. I receive the column name from the webapp as input, and I want to subset based on the value of that column, like so:

subset(myData, THECOLUMN == someValue)

Except where both THECOLUMN and someValue are variables. Is there a syntax for passing the desired column name as a string?

Seems to want a bareword that is the column name, not a variable that holds the column name.

zx8754
  • 52,746
  • 12
  • 114
  • 209
adv12
  • 8,443
  • 2
  • 24
  • 48

3 Answers3

31

Both subset and with are designed for interactive use and warnings against their use within other functions will be found in their help pages. This stems from their strategy of evaluation arguments as expressions within an environment constructed from the names of their data arguments. These column/element names would otherwise not be "objects" in the R-sense.

If THECOLUMN is the name of an object whose value is the name of the column and someValue is the name of an object whose value is the target, then you should use:

dfrm[ dfrm[[THECOLUMN]] == someValue , ]

The fact that "[[" will evaluate its argument is why it is superior to "$" for programing. If we use joran's example:

 d <- data.frame(x = letters[1:5],y = runif(5))
 THECOLUMN= "x"
 someValue= "c"

d[ d[[THECOLUMN]] == someValue , ]
#   x         y
# 3 c 0.7556127

So in this case all these return the same atomic vector:

d[[ THECOLUMN ]]
d[[ 'x' ]]
d[ , 'x' ]
d[, THECOLUMN ]
d$x  # of the three extraction functions: `$`, `[[`, and `[`,
     # only `$` is unable to evaluate its argument
IRTFM
  • 258,963
  • 21
  • 364
  • 487
27

This is precisely why subset is a bad tool for anything other than interactive use:

d <- data.frame(x = letters[1:5],y = runif(5))
> d[d[,'x'] == 'c',]
  x         y
3 c 0.3080524

Fundamentally, extracting things in R is built around [. Use it.

joran
  • 169,992
  • 32
  • 429
  • 468
  • This looks like it does what I want, but I haven't verified it yet. I'll mark it as the answer and follow up if I have problems. – adv12 Jun 12 '13 at 21:30
  • At the risk of sounding dumb, does this differ if I'm using a data.table rather than a data.frame? It seems to. With the data.table, I can use "d[d[,theColumnName] == 'c',]", but I don't seem to be able to use "d[d,"theColumnName"] == 'c',]". – adv12 Jun 13 '13 at 19:09
  • @adv12 No, `data.table`s work differently and (IMHO) unbelievable obtusely (to my immense and repeated frustration). I _think_ that the data.table must be keyed to do this: `setkey(d,"x"); d["c"]`. But I've always found data.table syntax so opaque that I usually end up overlooking "simpler" methods. – joran Jun 13 '13 at 19:22
4

I think you could use the following one-liner:

myData[ , grep(someValue, colnames(myData))]

where

colnames(myData)

outputs a vector containing all column names and

grep(someValue, colnames(myData))

should results in a numeric vector of length 1 (given the column name is unique) pointing to your column. See ?grep for information about pattern matching in R.

CubeJockey
  • 2,209
  • 8
  • 24
  • 31
mcmunder
  • 435
  • 1
  • 5
  • 11