0

Looking to create a function.

I would like to add the number of occurrences of an observation up within a given group (ex 5, 5 occurrences 2 times). The same numbers of Days within a Week by Business are to be summed. The summed values will be in a new row 'Total-occurrences.'

tapply or plyr works its way into this, however I'm stuck on a few nuances.

Thanks!

14X3 matrix

Business           Week        Days
A                **1**         3
A                **1**         3
A                **1**         1 
A                  2           4 
A                  2           1
A                  2           1 
A                  2           6    
A                  2           1
B                **1**         1
B                **1**         2
B                **1**         7
B                  2           2
B                  2           2
B                  2           na

**AND BECOME**

10X4 matrix

Business            Week       Days      Total-Occurrences 
A                 **1**        3         2
A                 **1**        1         1
A                   2          1         3
A                   2          4         1
A                   2          6         1
B                 **1**        1         1
B                 **1**        2         1
B                 **1**        7         1
B                   3          2         2
B                   2          na        0
tonytonov
  • 25,060
  • 16
  • 82
  • 98
user3608523
  • 65
  • 1
  • 7
  • What is your question? –  May 06 '14 at 14:34
  • Apologies, it's to create a function which creates a new row 'Total-Occurrences.' I want to add the total occurrences of the same 'Days' up per given Business & Week. – user3608523 May 06 '14 at 15:24

1 Answers1

1

If I understand your question correctly, you want to group your data frame by Business and Week and Days and calculate the occurences of each group in a new column Total-Occurences.

df <- structure(list(Business = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("A", "B"), class = "factor"), 
Week = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
1L, 2L, 2L, 2L), .Label = c("**1**", "2"), class = "factor"), 
Days = structure(c(3L, 3L, 1L, 4L, 1L, 1L, 5L, 1L, 1L, 2L, 
6L, 2L, 2L, 7L), .Label = c("1", "2", "3", "4", "6", "7", 
"na"), class = "factor")), .Names = c("Business", "Week", 
"Days"), class = "data.frame", row.names = c(NA, -14L))

There are certainly different ways of doing this. One way would be to use dplyr:

require(dplyr)

result <- df %.%
  group_by(Business,Week,Days) %.%
  summarize(Total.Occurences = n())

#>result

#   Business  Week Days Total.Occurences
#1         A **1**    1                1
#2         A **1**    3                2
#3         A     2    1                3
#4         A     2    4                1
#5         A     2    6                1
#6         B **1**    1                1
#7         B **1**    2                1
#8         B **1**    7                1
#9         B     2    2                2
#10        B     2   na                1

You could also use plyr:

require(plyr)

ddply(df, .(Business, Week, Days), nrow)

note that based on these functions, the output would be slightly different than what you posted in your question. I assume this may be a typo because in your original data there is no Week 3 but in your desired output there is.

Between the two solutions, the dplyr approach is probably faster.

I guess there are also other ways of doing this (but im not sure about tapply)

talat
  • 68,970
  • 21
  • 126
  • 157