6

Using R, I am trying to calculate the total time duration for each individual where this time duration is the time spent above certain threshold.

For example, in the plot below I have the concentration data for 3 subjects (ID), and I would like to find the time (x axis) spent above the blue dashed line for each individual. the data set structure would be something like:

head(dataset)
  ID time      CP
1  1  0.0 0.00000000
2  1  0.0 0.00000000
3  1  0.5 0.03759806
4  1  1.0 0.12523455
5  1  1.5 0.23483219
6  1  2.0 0.34820905

Solid lines represent the concentrations for 3 different subjects

I tried to use the following code:

library(data.table) 
TAbove<-setDT(dataset)[CP > .05, diff(range(time)), by = ID]

However, this code that it calculates the time duration from first rise above dashed blue line to the last drop. For example for the green line ID, see the black line.

enter image description here

How can I write a code that takes into account the times where the concentrations drop below the dashed line, by excluding them. the final result would be a total time duration of all the times above the dashed blue line. like below

enter image description here

Malek Ik
  • 101
  • 11
  • See `?rle`. Ignoring the multiple ids for the moment, having, say, `x = sin(seq(-3*pi, 3*pi, 0.1))` and computing `r = rle(x > threshold)`, the starting and end positions of successive `TRUE`s (i.e. `x > threshold`) are `s = cumsum(c(1, r$lengths))[r$values]` and `e = i + r$lengths[r$values] - 1`, respectively. Summing `time[e] - time[s]` should give the total time where `x > threshold`. – alexis_laz Aug 01 '16 at 15:59
  • @alexis_laz Can you add your comment as an answer? This solved the problem for me and I will upvote. Think you just have 1 typo where `i` should be `s` when you define `e` – Stefan Avey May 17 '18 at 20:47

2 Answers2

3

I think your solution is almost perfect, just leave out range. I tried the following on an extended dataset (added a few entries)

> dat <- fread("ID time      CP
+               1  0.0 0.00000000
+               1  0.0 0.00000000
+               1  0.5 0.03759806
+               1  1.0 0.12523455
+               1  1.5 0.23483219
+               1  2.0 0.34820905
+               1  3.0 0.5
+               2  0.0 0.5
+               2  0.5 0.01
+               2  1.0 0.2")

with the following result:

> dat[CP > .05, diff(time), by = ID]
   ID  V1
1:  1 0.5
2:  1 0.5
3:  1 1.0
4:  2 1.0

Edit: Calculation with original data set

Using the original data set

dataset <- fread("ID time      CP
                  1  0.0 0.00000000
                  1  0.0 0.00000000
                  1  0.5 0.03759806
                  1  1.0 0.12523455
                  1  1.5 0.23483219
                  1  2.0 0.34820905")

we get the following result:

> dataset[CP > .05, diff(time), by = ID]
   ID  V1
1:  1 0.5
2:  1 0.5
rhole
  • 440
  • 2
  • 8
  • For some reason, the code is giving me different answer. ID V1 1: 1 0.5 2: 1 0.5 3: 1 0.5 4: 1 0.5 – Malek Ik Aug 01 '16 at 15:39
  • Maybe restarting your R session or updating to the newest version of `data.table` helps. Just double checked my solution and it works on my machine with the newest version of R, `data.table` and a fresh R session. – rhole Aug 01 '16 at 15:44
  • Sorry, tried it but still giving me different results. Could you please write the code you used? – Malek Ik Aug 01 '16 at 15:53
  • Edited my answer with the calculation for the original data set provided by you. Time frames are correctly calculated between observations that have a `CP > 0.05`. Hope this helps! – rhole Aug 01 '16 at 16:03
1

So, thanks to rhole for providing the idea of how to solve the question. the code below helped me do the analysis, however I had to add a variable called "Day", and then calculate the time duration per day. Here I used day because there is one interval per day. But you can adjust it according to your need.

#sub-setting by day
dataset$Day[dataset$time>=0 &dataset$time<24] <- "Day 1"
dataset$Day[dataset$time>=24 &dataset$time<48] <- "Day 2"
dataset$Day[dataset$time>=48 &dataset$time<72] <- "Day 3"
#per day#
TAbove<-setDT(dataset)[CP > .05, diff((time)), by = .(ID,Day)]
library(plyr)
# sum the time duration for each day per person
sumPerDay<-summarise(group_by(TAbove, ID,Day),
           sum=sum(V1))
# sum the time duration for ALL days per person
sumAll<-summarise(group_by(TAbove, ID),
                 sum=sum(V1))
Malek Ik
  • 101
  • 11