I want to count the number of rows per person (hai_dispense_number) per month (month) in the following dataframe. My overall aim is to see if the mean count of rows increases from April to September. I'm pretty sure I should use the ave function to make a count variable. But all my attempts aren't working for me. See attempts below. Once I have the counts made, I think I'll be able to use ddply to do a mean summary per month. below is a toy df with the column 'obs' my desired output.
df
hai_dispense_number date_of_claim hai_atc month obs
9972511 Patient HAI0002664 2010-04-07 A10BA02 april 1
11376245 Patient HAI0002664 2010-05-04 A10BA02 may 1
12508505 Patient HAI0002664 2010-05-31 A10BA02 may 2
13480611 Patient HAI0002664 2010-06-30 A10BA02 june 1
13486327 Patient HAI0002664 2010-06-30 A10BH03 june 2
13567944 Patient HAI0002664 2010-06-08 A10BA02 june 3
15003657 Patient HAI0002664 2010-07-27 A10BA02 july 1
15003658 Patient HAI0002664 2010-07-27 A10BH03 july 2
16600413 Patient HAI0002664 2010-08-31 A10BB09 august 1
16600866 Patient HAI0002664 2010-08-23 A10BA02 august 2
16600867 Patient HAI0002664 2010-08-23 A10BH03 august 3
17537505 Patient HAI0002664 2010-08-27 A10BB09 august 4
19176349 Patient HAI0002664 2010-09-17 A10BB09 september 1
19176350 Patient HAI0002664 2010-09-17 A10BH03 september 2
19176358 Patient HAI0002664 2010-09-17 A10BA02 september 3
17765433 Patient HAI0006637 2010-09-17 A10BA02 september 4
12953451 Patient HAI0007418 2010-06-04 A10BA02 june 1
15786889 Patient HAI0007418 2010-07-28 A10BB09 july 1
15787103 Patient HAI0007418 2010-07-12 A10BB09 july 2
15787233 Patient HAI0007418 2010-07-05 A10BA02 july 3
15878776 Patient HAI0007418 2010-07-08 A10BB09 july 4
15908690 Patient HAI0007418 2010-07-23 A10BB09 july 5
17363576 Patient HAI0007418 2010-08-20 A10BB09 august 1
17554737 Patient HAI0007418 2010-08-13 A10BB09 august 2
Prior attempts
df$obs<-with(df, ave(month, hai_dispense_number, FUN=seq_along)) ##doesn't split by month
df$obs<-with(df, ave(month, hai_dispense_number, FUN=cumsum)) ##gives all NA values, think seq_along is actually what I want
df$obs <- ave(df$month, df$month, FUN=seq_along) ##this is better than the previous two, but doesn't seem to split by person
ddply(df,~month,summarise,mean=mean(obs)) ##this works absolutely fine, just need to counts right first!
Would value any input anyone could offer me. Seems like I'm getting something fundamental wrong here.