1

I have data in two columns: its date, and a factor variable. A fragment of my data:

           Date Category
1    2009-06-22    BREAD
2    2009-06-23    BREAD
3    2009-06-23    BREAD
4    2009-06-23      JAM
5    2009-06-23     MILK
6    2009-06-24    BREAD
9    2009-06-24     MILK
10   2009-06-25      JAM

Problem: I need to count how many of each Category type appeared on each month of each year.

I tried approaches like this, using aggregate, but I don't know how to fit the factor variable there.

Data Sample: Here is a workable data sample (with more months and years): http://rextester.com/DYMXN47464 Of course, my final (real) data goes from 2009 to 2018, with each month of each year, but those are too many observations, and I cannot share the whole data.

Carrol
  • 1,225
  • 1
  • 16
  • 29

2 Answers2

5

Based on your dataset. Add year and month to the data, group by year, month and category and count the results.

library(dplyr)
library(lubridate)

data %>% mutate(year = year(Date),
                month = month(Date)) %>% 
  group_by(year, month, Category) %>%
  summarise(count = n())

# A tibble: 11 x 4
# Groups:   year, month [?]
    year month Category count
   <dbl> <dbl> <fct>    <int>
 1  2009     6 MILK         2
 2  2009     6 BREAD        6
 3  2009     6 JAM          2
 4  2010     4 MILK         2
 5  2010     4 BREAD        7
 6  2010     4 JAM          2
 7  2011    12 MILK         4
 8  2011    12 BREAD       13
 9  2011    12 JAM          1
10  2012     1 MILK         1
11  2012     1 BREAD        2

data:

data <- structure(list(Date = structure(c(14417, 14418, 14418, 14418, 
14418, 14419, 14419, 14419, 14419, 14420, 14725, 14725, 14726, 
14726, 14726, 14726, 14727, 14727, 14727, 14727, 14728, 15335, 
15335, 15335, 15335, 15336, 15336, 15336, 15336, 15337, 15337, 
15337, 15337, 15338, 15338, 15338, 15338, 15339, 15339, 15342, 
15342, 15342), class = "Date"), Category = structure(c(2L, 2L, 
2L, 3L, 1L, 2L, 2L, 2L, 1L, 3L, 2L, 2L, 2L, 1L, 2L, 2L, 3L, 3L, 
1L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 3L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L), .Label = c("MILK", "BREAD", 
"JAM", "SALTO DE BANCA"), class = "factor")), row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1000L, 1001L, 1002L, 1003L, 
1004L, 1005L, 1006L, 1007L, 1008L, 1009L, 1010L, 3000L, 3001L, 
3002L, 3003L, 3004L, 3005L, 3006L, 3007L, 3008L, 3009L, 3010L, 
3011L, 3012L, 3013L, 3014L, 3015L, 3016L, 3017L, 3018L, 3019L, 
3020L), class = "data.frame")
phiver
  • 23,048
  • 14
  • 44
  • 56
2

We can also convert the Date class to yearmon (from zoo) and get the count

library(zoo)
library(dplyr)
data %>% 
     count(yearmon = as.yearmon(Date), Category)
# A tibble: 11 x 3
#   yearmon       Category     n
#   <S3: yearmon> <fct>    <int>
# 1 Jun 2009      MILK         2
# 2 Jun 2009      BREAD        6
# 3 Jun 2009      JAM          2
# 4 Apr 2010      MILK         2
# 5 Apr 2010      BREAD        7
# 6 Apr 2010      JAM          2
# 7 Dec 2011      MILK         4
# 8 Dec 2011      BREAD       13
# 9 Dec 2011      JAM          1
#10 Jan 2012      MILK         1
#11 Jan 2012      BREAD        2

NOTE: data taken from @phiver's post

akrun
  • 874,273
  • 37
  • 540
  • 662