Two approaches:
1) Using dcast
, cut
and sub
:
dcast(DT[, Q := cut(as.integer(sub('.*/','',Year_week)),
breaks = c(0,13,26,39,53),
labels = paste0('Q',1:4))],
Year_week + artist_id + number_of_events + number_of_streams ~ Q,
value.var = 'Q',
drop = c(TRUE,FALSE),
fun = length)
gives:
Year_week artist_id number_of_events number_of_streams Q1 Q2 Q3 Q4
1: 15/37 17879 1 89515 0 0 1 0
2: 16/22 22690 2 119653 0 1 0 0
3: 16/30 8296 1 957892 0 0 1 0
4: 16/33 8296 6 882282 0 0 1 0
5: 16/34 8296 5 926037 0 0 1 0
6: 16/35 8296 2 952704 0 0 1 0
What this does:
as.integer(sub('.*/','',Year_week))
extacts the weeknumber from the Year_week
column
- Using
cut
you divide that into quarters with the appropriate labels (see also ?cut
)
- With
dcast
you transpose the quarter column to wide format with a aggregation function (length
). By using drop = c(TRUE,FALSE)
in the dcast
function, you make sure that all quarters are included.
Notes:
- The
Q
-column is an ordered factor, so you can use that to arrange and filter your data as well.
- Depending on the use of the dummy columns: you don't always need those. When you want to use them as grouping or filtering variables, you can just work with the
Q
variable.
- However, some statistical tests require dummy variables (which justifies the
dcast
step).
2) Using cut
, sub
and lapply
:
DT[, Q := cut(as.integer(sub('.*/','',Year_week)),
breaks = c(0,13,26,39,53),
labels = paste0('Q',1:4))
][, paste0('Q',1:4) := lapply(paste0('Q',1:4), function(q) as.integer(q == Q))][]
which gives a similar result. Instead of transposing with dcast
, you just check whether one of the quarter labels is in the Q
-column.
Used data:
DT <- fread(' Year_week artist_id number_of_events number_of_streams
16/30 8296 1 957892
16/33 8296 6 882282
16/34 8296 5 926037
16/35 8296 2 952704
15/37 17879 1 89515
16/22 22690 2 119653')