1

I'm struggling to figure out how to aggregate and then merge a list of dataframes. Has anyone done this?

Here's the example. I have a list of dataframes created by splitting on a categorical variable.

list1df 93 obs of 6 variables:
- categoricalvar1: Factor w...
- categoricalvar2: Factor w...
- categoricalvar3: Factor w...
- numericvar1: num...
- numericvar2: num...
- thedate: Date, format "2018-11-13"...

list2df 3988 obs of 6 variables:
- categoricalvar1: Factor w...
- categoricalvar2: Factor w...
- categoricalvar3: Factor w...
- numericvar1: num...
- numericvar2: num...
- thedate: Date, format "2018-11-13"...

list3df 563 obs of 6 variables:
- categoricalvar1: Factor w...
- categoricalvar2: Factor w...
- categoricalvar3: Factor w...
- numericvar1: num...
- numericvar2: num...
- thedate: Date, format "2018-11-13"...

I'm trying to figure how to lapply/sapply something like this:

sumtable<- function(thedate,df,thefun)
{
  dfgrouped <- aggregate(. ~thedate, data=df, thefun, na.rm=TRUE)
  return(dfgrouped)
}

to each individual dataframe in the list, to create aggregated frames by individual day, after which I'll use ldply(thebiglist,data.frame) to glue them all back together.

I can't figure out how to make the sumtable function work with lapply and all the frames in the list though. Thank you in advance!

Christopher Penn
  • 539
  • 4
  • 14

1 Answers1

1

We can place the data.frames in a list and use aggregate with the specific function

lst <- lapply(mget(ls(pattern = '^list\\d+df$')), function(x) 
    aggregate(. ~ thedate, data = x, FUN = max, na.rm = TRUE))

do.call(rbind, lst)
akrun
  • 874,273
  • 37
  • 540
  • 662
  • What if the dataframes have wildly different names, but we know we want all of them? Do we need the regex? – Christopher Penn Dec 19 '18 at 20:27
  • @ChristopherPenn If you have data.frame object names as `Abcd`, `xy12`, `14df`, `hellomydear`, `dontyouknowme` etc. these don't have any pattern. The only way to create this as a list is `list(Abcd, xy12, ..., hellomydear)` – akrun Dec 19 '18 at 20:29
  • However, if these are the only objects created in the enviornment, then `mget(ls())` should return all those objects – akrun Dec 19 '18 at 20:30
  • 1
    That makes sense, we want to be greedy and get all the list objects. Will give this a try. – Christopher Penn Dec 19 '18 at 20:40
  • I wasn't able to get it to see the pattern. In the parent dataframe before calling split() I did a paste0 to prepend chan. to each of the categorical variables so that each dataframe is now chan.Abcd, chan.xy12, chan14df etc. I modified the pattern to match pattern='chan\\.' but ls itself is coming back null. I can see the parent list and the nested dataframes with ls.str, but I can't reference them. Any idea what I'm doing wrong? ls(pattern = "chan\\.") character(0) is the result – Christopher Penn Dec 20 '18 at 00:52
  • @ChristopherPenn Can you try `"^chan\\."` `^` is to anchor that it is the starting point. – akrun Dec 20 '18 at 04:18
  • That drops a big ol' character(0) unfortunately. – Christopher Penn Dec 20 '18 at 18:06