r- cumulative frequency when every combination doesn't appear always

Question

I need to get the cumulative clients by number of calls up to everyday.

An example table would be:

> data
   dia cli llam elegidos cumllam
1 1-11   a    1        1       1
2 3-11   a    1        1       2
3 1-11   b    2        1       2
4 2-11   b    1        1       3
5 2-11   c    2        0       2

As you can see, client a wasn't call in day 2-11, so the combination client a + day 2-11 doesn't appear in the table. If I run:

series<-data.frame(dcast(data, elegidos+dia~cumllam , length))

I get:

> series
  elegidos  dia X1 X2 X3
1        0 2-11  0  1  0
2        1 1-11  1  1  0
3        1 2-11  0  0  1
4        1 3-11  0  1  0

But if you consider up to the 2nd day how many clients were called once, client a should appear and it doesn't because I have no row in previous table for the combination client a and day 2-11.

The table should look like:

  elegidos  dia X1 X2 X3
1        0 2-11  0  1  0
2        1 1-11  1  1  0
3        1 2-11  1  0  1
4        1 3-11  0  1  1

x1 is the number of clients who received until and including the day in the row exactly 1 call.

x2 is the number of clients who received until and including the day in the row exactly 2 calls.

And so on.

The explanation is:

Client "a" gets a call on day 1st and 3rd, client "b" receives 2 calls on day 1st and 1 call on day 2nd. So, 1st day we have 1 client receiving 1 call, and another receiving 2 calls.
2nd day, since it's cumulative, we have client a, who stays the same with one call and client b who gets one more call reaching 3 calls.
On the 3rd day, client a receives another call and climb up to 2 calls cumulative, that's why he's in x2 and client b stays the same in x3.

Is there a way to do this cumulative count to each day, without having to create a row for each client day combination?

Thanks.

Do you want to accumulate `elegidos` as well, or just client counts? — shadowtalker, Nov 04 '14 at 16:54
@ssdecontrol I need to accumulate just client counts. Elegidos should be a column, like in the last table. — GabyLP, Nov 04 '14 at 16:59
@akrun, thanks for having a look. Client a gets a call on day 1st and 3rd, client b receives 2 calls on day 1st and 1 call on day 2nd. So, 1st day we have 1 client receiving 1 call, and another receiving 2 calls. 2nd day, since it's cumulative, we have client a, who stays the same with one call and client b who gets one more call reaching 3 calls. On the 3rd day, client a receives another call and climb up to 2 calls cumulative, that's why he's in x2 and client b stays the same in x3. — GabyLP, Nov 04 '14 at 19:54
Hi, the 2nd table is not what I want, it's only what reshape gives. Client a receives 1 call on day 1st and 1 on day 3rd. so, the last day he has 2 calls. That's what you see in the las line of reshape. What is not showing is client b, who recieved 2 in the 1st day and 1 in the 2nd, and since didn't receive any call on the 3rd day, doesn't appear and I need to show him in the 3rd row. The same happends with client a in the 2nd day. — GabyLP, Nov 05 '14 at 14:29
@thanks to you. It's a bit messy, but what I need is just the distribution of clients by the cumulative calls for each day. — GabyLP, Nov 05 '14 at 14:38

akrun · Accepted Answer · 2014-11-06T04:48:47.673

Try this:

dat1 <-data[!!data$elegidos,]
dat2 <- expand.grid(dia=sort(unique(dat1$dia)), cli=unique(dat1$cli))
dat3 <- merge(data,dat2, all=TRUE)
dat3N <- dat3[with(dat3, order( cli, dia)),]
library(zoo)
dat3N[,c('elegidos', 'cumllam')] <- lapply(dat3N[, 
                      c('elegidos', 'cumllam')], na.locf)

library(reshape2)
dcast(dat3N, elegidos+dia~cumllam, length, value.var='cumllam')
#  elegidos  dia 1 2 3
#1        0 2-11 0 1 0
#2        1 1-11 1 1 0
#3        1 2-11 1 0 1
#4        1 3-11 0 1 1

Update

You could also do this in data.table

 library(data.table)
 DT <- data.table(data)
 setkey(DT, dia, cli)
 DT1 <- rbind(DT[!!elegidos, CJ(dia=unique(dia), 
      cli=unique(cli))],  DT[elegidos==0, 1:2, with=FALSE])
 nm1 <- c('elegidos', 'cumllam')
 #There is also a  roll option but unfortunately I couldn't get it right here.
 # So, I am using na.locf from zoo. 
 DT2 <- DT[DT1[order(cli, dia)]][,(nm1):= lapply(.SD, na.locf), .SDcols=nm1]
 dcast.data.table(DT2, elegidos+dia~cumllam, length, value.var='cumllam')
 #   elegidos  dia 1 2 3
 #1:        0 2-11 0 1 0
 #2:        1 1-11 1 1 0
 #3:        1 2-11 1 0 1
 #4:        1 3-11 0 1 1

data

data <- structure(list(dia = c("1-11", "3-11", "1-11", "2-11", "2-11"
), cli = c("a", "a", "b", "b", "c"), llam = c(1L, 1L, 2L, 1L, 
2L), elegidos = c(1L, 1L, 1L, 1L, 0L), cumllam = c(1L, 2L, 2L, 
3L, 2L)), .Names = c("dia", "cli", "llam", "elegidos", "cumllam"
), class = "data.frame", row.names = c("1", "2", "3", "4", "5"))

awesome! thanks! expand.grid does the combinations. question: is it necessary the order step? (dat3N) — GabyLP, Nov 05 '14 at 16:02
@Gaby P After the `merge` the order got messed up. I was trying to fill up the NA values using `na.locf` which replace NA with the previous row value. — akrun, Nov 05 '14 at 16:04

r- cumulative frequency when every combination doesn't appear always

1 Answers1

Update

data