-2

I have a data frame that records the daily occurrence of different activities. I would like to identify the number of days that an activity occurrence in repeated order and its duration. A week starts with day1 and ends with day7. For example in the case of id 12 the activity occurs during 7 days and duration is 11; in the case of of 123 the occurrence of activity is not consecutive as their is a gap day (day3 and day6) and in the case of id 10 the number of occurrence is 6 days and duration is 18.

Input:

  id   day1 day2 day3 day4 day5 day6 day7
    12    2    1    2    1    1    3    1
   123    0    3    0    3    3    0    3
    10    0    3    3    3    3    3    3

Output:

id   Duration Occurance
12     11        7
123    12        0
10     18        6

Sample data set:

structure(list(id = c(12L, 123L, 10L), day1 = c(2L, 0L, 3L), 
    day2 = c(1L, 3L, 3L), day3 = c(2L, 0L, 3L), day4 = c(1L, 
    3L, 3L), day5 = c(1L, 3L, 3L), day6 = c(3L, 0L, 3L), day7 = c(1L, 
    3L, 3L)), row.names = c(NA, -3L), class = c("data.table", 
"data.frame"), .internal.selfref = <pointer: 0x000002a81a571ef0>)
Rfanatic
  • 2,224
  • 1
  • 5
  • 21

2 Answers2

2

Using apply row-wise :

cbind(df[, 1], t(apply(df[, -1], 1, function(x) {
   inds <- rle(x != 0)
   if(length(inds$length) <= 2)
      c(Duration = sum(x), Occurance = max(inds$lengths))
   else
      c(Duration = sum(x), Occurance = 0)
})))

#    id Duration Occurance
#1:  12       11         7
#2: 123       12         0
#3:  10       21         7

Using rle we check the series of 0 and non-zero values we have. If they are less than equal to 2 we include the length of max length along with sum of row values else return sum with 0.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • many thanks when you have time could you please add some comments – Rfanatic Apr 10 '20 at 08:32
  • Added some comments. The `dput` you shared is different from the data you have shown for the last row. – Ronak Shah Apr 10 '20 at 08:35
  • @RonakShah my question is similar to this one only that the start of the repeated sequences is given by the Day variable. How can I specify the start of the sequence?https://stackoverflow.com/questions/61187493/identify-consecutive-sequences-based-on-a-given-variable – Rstudent Apr 13 '20 at 13:29
0

Named dat the dataframe, then using rle function:

out <- cbind(dat$id, t(apply(dat[, -1], 1, function(y) c(sum(y),max(rle(y>0)$lengths)))))
out <- data.frame(out)
names(out) <- c("id", "Duration", "Occurrence")
out

   id Duration Occurrence
1  12       11          7
2 123       12          2
3  10       21          7
Tur
  • 604
  • 4
  • 9