0

We have time series data in which repeated observations were measured for several subjects. I would like to calculate the number of occasions in which the variable positive == 1 occurs for each subject (variable id).

A second aim is to identify the maximum length of these runs of consecutive observations in which positive == 1. For each subject there are likely to be multiple runs within the study period. Rather than calculating the maximum number of consecutive positive observations per subject, I would like to calculate the maximum run length within an individual run.

Here is a toy data set that illustrates the problem:

set.seed(1234)
test <- data.frame(id = rep(1:3, each = 10), positive = round(runif(30,0,1)))
test$run <- sequence(rle(test$positive)$lengths)
test$run_positive <- ifelse(test$positive == '0', '0', test$run)
test$episode <- ifelse(test$run_positive == '1', '1', '0')

count(test$episode)
  x freq
1 0   25
2 1    5

The code above gets close to answering my first question in which I am attempting to count the number of positive episodes, however it is not conditioned by subject. This has the unfortunate effect of counting the last observation of Subject #1 and the first observation of Subject #2 in the same run. Can anyone help me develop code to condition this run length encoding by subject?

Secondly, how can one extract only the maximum run length for each run in which positive == 1? I would like to add an additional column in which only the observations in which the maximum run length are recorded. For Subject #1, this would look like:

   id positive run run_positive episode max_run
1   1        0   1            0       0       0
2   1        1   1            1       1       0
3   1        1   2            2       0       0
4   1        1   3            3       0       0
5   1        1   4            4       0       0
6   1        1   5            5       0       5
7   1        0   1            0       0       0
8   1        0   2            0       0       0
9   1        1   1            1       1       0
10  1        1   2            2       0       2

If anyone can come up with a method to do this I would be extremely grateful.

Entropy
  • 378
  • 6
  • 16
  • I don't understand your "maximum run length for each run". Each run has _one_ length, so "maximum" does not make sense here. At least not to me. For example, the first run for id 1, where positive = 1 has a length of 5. Within id, within run for positive = 1, you want to pad the run length with leading zeros? – Henrik Sep 07 '13 at 12:21
  • For the first question you can try `with(test, table(id, positive))` – Henrik Sep 07 '13 at 13:27
  • Sorry if I used the wrong terminology to describe what I was looking for. You're correct that I am looking to 'pad' the run length with leading zeros. The code you provided in your example `with(test, table(id, positive))` is quite close. The only problem is that it sums all of the runs together per patient, whereas I would like to uniquely define the run length for each run by subject. Therefore, instead of reporting that ID#1 has 7 positives, it would be ideal to say that ID#1 has 2 runs of lengths 5 and 2. Then I would like to wrap that output into `test` and pad it with the leading zeros. – Entropy Sep 07 '13 at 16:20
  • Sorry if I was unclear, my suggestion only referred to what I thought was the first question: "number of occasions in which the variable positive == 1 occurs for each subject". – Henrik Sep 07 '13 at 16:33

1 Answers1

1

I think this answers your first question:

aggregate(positive ~ id, data = test, FUN = sum)

  id positive
1  1        7
2  2        4
3  3        4

This might answer your second question, but I would need to see the desired result for each id to check:

set.seed(1234)
test <- data.frame(id = rep(1:3, each = 10), positive = round(runif(30,0,1)))
test$run <- sequence(rle(test$positive)$lengths)
test$run_positive <- ifelse(test$positive == '0', '0', test$run)
test$episode <- ifelse(test$run_positive == '1', '1', '0')

test$group <- paste(test$id*10, test$positive, sep='')

my.seq <- data.frame(rle(test$group)$lengths)
test$first <- unlist(apply(my.seq, 1, function(x) seq(1,x)))
test$last  <- unlist(apply(my.seq, 1, function(x) seq(x,1,-1)))

test$max <- ifelse(test$last == 1 & test$positive==1, test$run, 0)
test

   id positive run run_positive episode group first last max
1   1        0   1            0       0   100     1    1   0
2   1        1   1            1       1   101     1    5   0
3   1        1   2            2       0   101     2    4   0
4   1        1   3            3       0   101     3    3   0
5   1        1   4            4       0   101     4    2   0
6   1        1   5            5       0   101     5    1   5
7   1        0   1            0       0   100     1    2   0
8   1        0   2            0       0   100     2    1   0
9   1        1   1            1       1   101     1    2   0
10  1        1   2            2       0   101     2    1   2
11  2        1   3            3       0   201     1    2   0
12  2        1   4            4       0   201     2    1   4
13  2        0   1            0       0   200     1    1   0
14  2        1   1            1       1   201     1    1   1
15  2        0   1            0       0   200     1    1   0
16  2        1   1            1       1   201     1    1   1
17  2        0   1            0       0   200     1    4   0
18  2        0   2            0       0   200     2    3   0
19  2        0   3            0       0   200     3    2   0
20  2        0   4            0       0   200     4    1   0
21  3        0   5            0       0   300     1    5   0
22  3        0   6            0       0   300     2    4   0
23  3        0   7            0       0   300     3    3   0
24  3        0   8            0       0   300     4    2   0
25  3        0   9            0       0   300     5    1   0
26  3        1   1            1       1   301     1    4   0
27  3        1   2            2       0   301     2    3   0
28  3        1   3            3       0   301     3    2   0
29  3        1   4            4       0   301     4    1   4
30  3        0   1            0       0   300     1    1   0
Mark Miller
  • 12,483
  • 23
  • 78
  • 132