5

I despair of this seemingly easy task: I have a number vector values and a time vector time and I need to find out for how long the values are inside a certain range. Here is some data:

df <- data.frame(time= c(1,3:10), values= c(7,3:10))

For this data the values were for 4.5 hours outside the range 3.5 to 6.5. Here a visualization how to determine those 4.5 hours:

Plot

In this plot the x axis is the time, the y axis the values, the points the real measures, the dashed line are the range borders 3.5 and 6.5, the full line is just a help to better see when the range borders are crossed.

Am I missing an obvious way to determine for how long the values are outside the range?


The plot is created with

threshold_low <- 3.5
threshold_high <- 6.5

ggplot(data= df, mapping= aes(time, values)) +
  geom_point() +
  geom_line() +
  geom_hline(yintercept= c(threshold_low, threshold_high), linetype= "dashed") +
  scale_x_continuous(breaks=seq(0, 10, 1)) +
  scale_y_continuous(breaks=seq(0, 10, 1))
LulY
  • 976
  • 1
  • 9
  • 24
  • `data.table::rleid` will give you your answer. Exactly how to use it will depend on exactly how you define "how long": longest single period, total duration of all periods, etc. – Limey Jul 24 '23 at 13:17
  • I know you need the total tiome. It *is* basis of the the solution. I'm working on it. The complication is that you want to interpolate between data points. Without that, it would be straightforward. Would you prefer a solution that adds rows to the input data frame or does arithmetic on the data points you do have? – Limey Jul 24 '23 at 13:26
  • @Limey Ah, I see! I look for arithmetic on the data points for – LulY Jul 24 '23 at 13:30

5 Answers5

6

If you don't want to manually calculate every crossing point, an easier way would be to use linear interpolation:

library(tidyverse)

range_df <- approx(df$time, df$values, xout = seq(1, 10, 0.001)) %>%
  as.data.frame() %>%
  setNames(names(df)) %>%
  mutate(inrange = values < threshold_high & values > threshold_low) %>%
  mutate(group = data.table::rleid(inrange)) %>%
  group_by(group) %>%
  filter(!inrange) %>%
  summarize(starts = min(time), ends = max(time), duration = ends - starts)

This results in

range_df
#> # A tibble: 3 x 4
#>   group starts  ends duration
#>   <int>  <dbl> <dbl>    <dbl>
#> 1     1   1     1.25     0.25
#> 2     3   2.75  3.5      0.75
#> 3     5   6.5  10        3.5 

We can show these numbers are about right by plotting a geom_rect with this data over your existing plot:

ggplot(data= df, mapping= aes(time, values)) +
  geom_point() +
  geom_line() +
  geom_rect(aes(xmin = starts, xmax = ends, ymin = -Inf, ymax = Inf),
            data = range_df, fill = 'red', alpha = 0.2, inherit.aes = FALSE) +
  geom_hline(yintercept= c(threshold_low, threshold_high), linetype= "dashed") +
  scale_x_continuous(breaks=seq(0, 10, 1)) +
  scale_y_continuous(breaks=seq(0, 10, 1))

enter image description here

And the total amount of time that values was in the target range is given by:

sum(range_df$duration)
#> [1] 4.50
Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
3

A mathematical approach would be to

  1. Linearly interpolate between the points
  2. Find the zeros
  3. If the point between 2 consecutive zeros falls outside the limits calculate the difference between these 2 zeros
  4. Sum all difference
library(dplyr)
library(rootSolve)
library(magrittr)

df <- data.frame(time= c(1,3:10), values= c(7, 3:10))

f <- approxfun(df$time, df$values)
lims <- c(3.5, 6.5)

lapply(lims, \(l) uniroot.all(\(x) f(x) - l , range(df$time))) %T>%
   {print(list(Uniroots = .))} %>%
   unlist() %>%
   c(range(df$time))%>%
   sort() %>%
   unique() %>%
   {cbind(head(., -1L), tail(., -1L))} %T>% 
   {print(list(zeros = .))} %>%
   set_colnames(paste0("x", 0:1)) %>%
   as_tibble() %>%
   mutate(is_outside = !between(f((x0 + x1) / 2), lims[1], lims[2]),
          rng = if_else(is_outside, x1 - x0, 0)) %T>%
   {print(list(signs = .))} %>%   
   summarize(time_outside = sum(rng))

# $Uniroots
# $Uniroots[[1]]
# [1] 2.75 3.50

# $Uniroots[[2]]
# [1] 1.25 6.50


# $zeros
#      [,1]  [,2]
# [1,] 1.00  1.25
# [2,] 1.25  2.75
# [3,] 2.75  3.50
# [4,] 3.50  6.50
# [5,] 6.50 10.00

# $signs
# # A tibble: 5 × 4
#      x0    x1 is_outside   rng
#   <dbl> <dbl> <lgl>      <dbl>
# 1  1     1.25 TRUE        0.25
# 2  1.25  2.75 FALSE       0   
# 3  2.75  3.5  TRUE        0.75
# 4  3.5   6.5  FALSE       0   
# 5  6.5  10    TRUE        3.5 

# # A tibble: 1 × 1
#   time_outside
#          <dbl>
# 1          4.5
thothal
  • 16,690
  • 3
  • 36
  • 71
3

As a one-liner:

with(df, integrate(\(x) +(abs(approxfun(time, values)(x) - 5) > 1.5), min(time), max(time)))
#> 4.5 with absolute error < 4.3e-14
jblood94
  • 10,340
  • 1
  • 10
  • 15
2

Inspired by the answer of @AllanCameron (thanks!) I come to the following short solution:

precition <- .001 # Choose how detailed the calculation (interpolation) is

values_interpolated <- approx(df$time, df$values, xout= seq(1, 10, precision))$y # interpolate

sum(values_interpolated < threshold_low | values_interpolated > threshold_high) * precision
4.499
LulY
  • 976
  • 1
  • 9
  • 24
1

As suggested in the comments, if you decide to trust a purely linear fit between each pair of data points, then you simply need to calculate the intersection of the linear fit with your threshold for max or min range. That's a matter of basic linear equations.

$slope = (y_2 -y_1)/(x_2 - x_1)$ hmmm... whatever happened to mathjax?

intercept = y_2 - slope*x_2 (or y_1,x_1; doesn't matter).

You don't have to test every segment, just those for which one point exceeds the desired range. If both points exceed in the same direction, then that entire segment is out-of-range.

Carl Witthoft
  • 20,573
  • 9
  • 43
  • 73