1

How do I take the average of a few entries in a column whose corresponding entry in another column,has the same entries?

For instance I have a large table with say 3 columns, time and prices being 2. and lets say under the time column the values repeat. like 10:30 appears 4 times, then i would need to take the average of the corresponding price column entries and summarize the same onto a single row of 10:30 with a single price of it. Can someone provide me some insights?

Sample data:

time      prices     size
10:00        23        1
10:15        12        3
10:30        12        1
10:30        19        4
10:45        12        1

I would like to modify rows 3 and 4 merging into a single row, averaging the prices.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
Probabilityman
  • 361
  • 1
  • 5
  • 7

2 Answers2

3

How about something like

tapply(prices, time, mean)

For a more complete picture, see ?tapply

But what would you like to do with the column size?

EDIT:

To take the mean of prices and the last value of size, here's one suggestion:

myDF<-data.frame(time=c("10:00","10:15","10:30","10:30","10:45"),
  prices=c(23,12,12,19,12),size=c(1,3,1,4,1))

theRows <- tapply(seq_len(nrow(myDF)), myDF$time, function(x) {
  return(data.frame(time = head(myDF[x, "time"],1), prices = mean(myDF[x, "prices"]),
    size = tail(myDF[x, "size"], 1)))
  }
)

Reduce(function(...) rbind(..., deparse.level = FALSE), theRows)

p.s. This can be done very well using ddply -- see Paul's answer, too!

BenBarnes
  • 19,114
  • 6
  • 56
  • 74
  • Thanks thats very helpful, for the size column , I would like it to consider the last entry .. I would like it to retain 4 in this case. and thats how it must work for all. – Probabilityman Apr 17 '12 at 14:15
3

You could also take a look at the plyr package. I would use ddply for this:

ddply(df, .(time), summarise, 
   mean_price = mean(prices),
   sum_size = sum(size))

this assumes your data is in df. For a more elaborate description of plyr, please take a look at this paper in the Journal of Statistical Software.

Other alternatives include using data.table, or ave.

Paul Hiemstra
  • 59,984
  • 12
  • 142
  • 149
  • Thanks for that. I do not quite follow the functionality of ddply. what if i would like to add up the contents of the size column , average out the prices column for entries that have the same time at various rows. – Probabilityman Apr 17 '12 at 22:50
  • I extended my code to include summing of the size column, in addition I added a reference to a paper which describes plyr quite nicely. – Paul Hiemstra Apr 18 '12 at 07:24
  • Although I have a question on how do i retain the value of the time column at the same time. when i run the above command, it re writes the entire column labeling the columns as "1,2,3..", but I want the previous value to be retained – Probabilityman Apr 22 '12 at 23:52
  • Could you add a reproducible example as an additional section to your question? – Paul Hiemstra Apr 23 '12 at 08:58