3

I have a matrix with three columns: county, date, and number of ED visits. The dates repeat for each county, like this (just an example):

  County A  1/1/2012  2
  County A  1/2/2012  0
  County A  1/3/2012  5
  ... etc.
  County B  1/1/2012  3
  County B  1/2/2012  4
  ... etc.

I would like to collapse this matrix to sum the visits from all counties for each date. So it would look like this:

1/1/2012  5
1/2/2012  4
etc.

I am trying to use the "table()" function in R but can't seem to get it to operate on visits by date in this manner. When I do "table(dt$date, dt$Visits)" it gives me a table of frequencies like this:

             0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
  2011-01-01 3 1 2 0 1 1 0 2 0 0  0  0  0  0  0  0
  2011-01-02 2 3 1 0 0 1 0 0 1 0  2  0  0  0  0  0
  2011-01-03 3 1 1 2 1 0 0 0 0 1  0  0  0  0  1  0

Any suggestions? Is there a better function to use, perhaps a "sum" of some sort?

Thanks!

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
mEvans
  • 905
  • 4
  • 15
  • 18

2 Answers2

5

table() is not for summation of values, it is for record counts. If you want to use tapply you get a table output and can apply the sum function. Or you can use ave to get a sum vector of equal length as the dataframe. Perhaps:

  with( EDcounts, tapply(EDcounts[[3]], EDcounts[[2]], sum, na.rm=TRUE) )

You maybe able to coax xtabs into summation of visit counts as well if you put the "visits" column name on the LHS of the formula.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Thank you! Tapply worked wonderful. Never seen that "with" before- cool! – mEvans Aug 02 '12 at 00:10
  • `with` will simplify your console work considerably. It's help page warns that it can lead to errors inside functions as can the use of "$" as a function. – IRTFM Aug 02 '12 at 00:32
3

As @DWin states, table() is not for summation, but for record counts.

I give three examples of approaches, using plyr, data.table and aggregate

all_data <- expand.grid(country = paste('Country', LETTERS[1:3]), 
  date = seq(as.Date('2012/01/01'), as.Date('2012/12/31'), by = 1) )

all_data[['ed_visits']] <- rpois(nrow(all_data), lambda = 5)



# using plyr

library(plyr)

by_date_plyr <- ddply(all_data, .(date), summarize, visits = sum(ed_visits))


# using data.table
library(data.table)
all_DT <- data.table(all_data)
by_date_dt <- all_DT[, list(visits = sum(ed_visits)), by = 'date' ]

# using aggregate
by_date_base <- aggregate(ed_visits ~ date, data = all_data, sum)
mnel
  • 113,303
  • 27
  • 265
  • 254