1

I'm building an R script that's intended to query a database multiple times (one for every permutation from the elements of 3 vectors, but I'm having a hard time figuring out how to use ldply to achieve this.

tags <- c("tag1","tag2","tag3")
times <- c("2012-08-01 13:00:00","2012-08-07 21:00:00")
timesteps <- c("2m", "10m","60m", "90m")


query <- function(tag, time, timestep) {

  sql <- paste("select tag, time, timestep, value from mydb where tag = '",tag,"' and time = '",time,"' and timestep = '",timestep,"'", sep="")

  # pretend the line below is actually querying a database and returning a DF with one row
  data.frame(tag = tag, time = time, timestep = timestep, value = rnorm(1))

}
# function works correctly!  
query(time = times[1], tag = tags[1], timestep = timesteps[1])

# causes an error! (Error in FUN(X[[1L]], ...) : unused argument(s) (X[[1]]))
ldply(times, query, time = times, tag = tags, timestep = timesteps)

I thought I could use ldply nested three times, one for each vector, but I don't even get out of the first level!

Any ideas what I can do?

Tommy O'Dell
  • 7,019
  • 13
  • 56
  • 69

2 Answers2

3

I think this is simplified considerably if you use mdply (or equivalently just mapply):

tags <- c("tag1","tag2","tag3")
times <- c("2012-08-01 13:00:00","2012-08-07 21:00:00")
timesteps <- c("2m", "10m","60m", "90m")


query <- function(tags, times, timesteps) {

  sql <- paste("select tag, time, timestep, value from mydb where 
            tag = '",tags,"' and time = '",times,"' and timestep = '",timesteps,"'", sep="")
  # pretend the line below is actually querying a database and returning a DF with one row
  data.frame(tag = tags, time = times, timestep = timesteps, value = rnorm(1))

}

dat <- expand.grid(tags, times, timesteps)
colnames(dat) <- c('tags','times','timesteps')

mdply(dat,query)

Note the very slight changes in variable names to make them all agree across the data and function arguments.

joran
  • 169,992
  • 32
  • 429
  • 468
  • Great answer! Thank you. I wondered if `expand.grid` was somehow the answer but could not recall its name, nor would I have known what to do with result! – Tommy O'Dell Aug 08 '12 at 22:51
1

This will get the job done but it only uses apply. First I create an object with the combinations of interest and then I rewrite query to take a row from that object instead of 3 inputs.

tags <- c("tag1","tag2","tag3")
times <- c("2012-08-01 13:00:00","2012-08-07 21:00:00")
timesteps <- c("2m", "10m","60m", "90m")

# Use expand.grid to create an object with all the combinations
dat <- expand.grid(tags, times, timesteps)

# Rewrite query to take in a row of dat
query <- function(row) {
    # extract the pieces of interest
    tag <- row[1]
    time <- row[2]
    timestep <- row[3]

    sql <- paste("select tag, time, timestep, value from mydb where tag = '",tag,"' and time = '",time,"' and timestep = '",timestep,"'", sep="")

    # pretend the line below is actually querying a database and returning a DF with one row
    data.frame(tag = tag, time = time, timestep = timestep, value = rnorm(1))

}

# function works correctly on a single row  
query(dat[1,])

# apply the function to each row
j <- apply(dat, 1, query)
# bind all the output together
do.call(rbind, j)
Dason
  • 60,663
  • 9
  • 131
  • 148
  • Thanks for your answer, Dason. I really should learn more about the base functions for these kinds of problems. – Tommy O'Dell Aug 08 '12 at 22:52
  • @TommyO'Dell joran's answer is definitely better. I'm not too used to the mapply types of functions so I typically reduce the problem down to a single dimension like I did here. – Dason Aug 08 '12 at 23:07