How to find how long a number lays outside a range?

Question

I despair of this seemingly easy task: I have a number vector values and a time vector time and I need to find out for how long the values are inside a certain range. Here is some data:

df <- data.frame(time= c(1,3:10), values= c(7,3:10))

For this data the values were for 4.5 hours outside the range 3.5 to 6.5. Here a visualization how to determine those 4.5 hours:

In this plot the x axis is the time, the y axis the values, the points the real measures, the dashed line are the range borders 3.5 and 6.5, the full line is just a help to better see when the range borders are crossed.

Am I missing an obvious way to determine for how long the values are outside the range?

The plot is created with

threshold_low <- 3.5
threshold_high <- 6.5

ggplot(data= df, mapping= aes(time, values)) +
  geom_point() +
  geom_line() +
  geom_hline(yintercept= c(threshold_low, threshold_high), linetype= "dashed") +
  scale_x_continuous(breaks=seq(0, 10, 1)) +
  scale_y_continuous(breaks=seq(0, 10, 1))

`data.table::rleid` will give you your answer. Exactly how to use it will depend on exactly how you define "how long": longest single period, total duration of all periods, etc. — Limey, Jul 24 '23 at 13:17
I know you need the total tiome. It *is* basis of the the solution. I'm working on it. The complication is that you want to interpolate between data points. Without that, it would be straightforward. Would you prefer a solution that adds rows to the input data frame or does arithmetic on the data points you do have? — Limey, Jul 24 '23 at 13:26
@Limey Ah, I see! I look for arithmetic on the data points for — LulY, Jul 24 '23 at 13:30

Allan Cameron · Answer 1 · 2023-07-24T14:00:27.940

6

If you don't want to manually calculate every crossing point, an easier way would be to use linear interpolation:

library(tidyverse)

range_df <- approx(df$time, df$values, xout = seq(1, 10, 0.001)) %>%
  as.data.frame() %>%
  setNames(names(df)) %>%
  mutate(inrange = values < threshold_high & values > threshold_low) %>%
  mutate(group = data.table::rleid(inrange)) %>%
  group_by(group) %>%
  filter(!inrange) %>%
  summarize(starts = min(time), ends = max(time), duration = ends - starts)

This results in

range_df
#> # A tibble: 3 x 4
#>   group starts  ends duration
#>   <int>  <dbl> <dbl>    <dbl>
#> 1     1   1     1.25     0.25
#> 2     3   2.75  3.5      0.75
#> 3     5   6.5  10        3.5

We can show these numbers are about right by plotting a geom_rect with this data over your existing plot:

ggplot(data= df, mapping= aes(time, values)) +
  geom_point() +
  geom_line() +
  geom_rect(aes(xmin = starts, xmax = ends, ymin = -Inf, ymax = Inf),
            data = range_df, fill = 'red', alpha = 0.2, inherit.aes = FALSE) +
  geom_hline(yintercept= c(threshold_low, threshold_high), linetype= "dashed") +
  scale_x_continuous(breaks=seq(0, 10, 1)) +
  scale_y_continuous(breaks=seq(0, 10, 1))

And the total amount of time that values was in the target range is given by:

sum(range_df$duration)
#> [1] 4.50

edited Jul 24 '23 at 14:00

answered Jul 24 '23 at 13:49

Allan Cameron

147,086
7
49
87

1

OP wants the time *out of range*, not in range. But it's an easy fix... – Limey Jul 24 '23 at 13:56
I didn't know about `approx`. That makes the solution *much* easier than the one I was working on. – Limey Jul 24 '23 at 13:58
@Limey thanks. The first line of the question threw me, but you're right - it's easily flipped over to time outside the range. – Allan Cameron Jul 24 '23 at 14:01
1

There is `dplyr::consecutive_id()` if you wish to stay tidy. – zx8754 Jul 25 '23 at 07:56
Thanks @zx8754. Is that new? I always wished the tidyverse had an `rleid` equivalent. – Allan Cameron Jul 25 '23 at 07:57
See https://stackoverflow.com/a/74428002/680068 it took only 7 years. – zx8754 Jul 25 '23 at 08:11

thothal · Answer 2 · 2023-07-24T14:34:16.447

A mathematical approach would be to

Linearly interpolate between the points
Find the zeros
If the point between 2 consecutive zeros falls outside the limits calculate the difference between these 2 zeros
Sum all difference

library(dplyr)
library(rootSolve)
library(magrittr)

df <- data.frame(time= c(1,3:10), values= c(7, 3:10))

f <- approxfun(df$time, df$values)
lims <- c(3.5, 6.5)

lapply(lims, \(l) uniroot.all(\(x) f(x) - l , range(df$time))) %T>%
   {print(list(Uniroots = .))} %>%
   unlist() %>%
   c(range(df$time))%>%
   sort() %>%
   unique() %>%
   {cbind(head(., -1L), tail(., -1L))} %T>% 
   {print(list(zeros = .))} %>%
   set_colnames(paste0("x", 0:1)) %>%
   as_tibble() %>%
   mutate(is_outside = !between(f((x0 + x1) / 2), lims[1], lims[2]),
          rng = if_else(is_outside, x1 - x0, 0)) %T>%
   {print(list(signs = .))} %>%   
   summarize(time_outside = sum(rng))

# $Uniroots
# $Uniroots[[1]]
# [1] 2.75 3.50

# $Uniroots[[2]]
# [1] 1.25 6.50


# $zeros
#      [,1]  [,2]
# [1,] 1.00  1.25
# [2,] 1.25  2.75
# [3,] 2.75  3.50
# [4,] 3.50  6.50
# [5,] 6.50 10.00

# $signs
# # A tibble: 5 × 4
#      x0    x1 is_outside   rng
#   <dbl> <dbl> <lgl>      <dbl>
# 1  1     1.25 TRUE        0.25
# 2  1.25  2.75 FALSE       0   
# 3  2.75  3.5  TRUE        0.75
# 4  3.5   6.5  FALSE       0   
# 5  6.5  10    TRUE        3.5 

# # A tibble: 1 × 1
#   time_outside
#          <dbl>
# 1          4.5

Nice, +1. This is a more rigorous solution if the solution needs to be correct to several decimal places. — Allan Cameron, Jul 24 '23 at 14:28

score 3 · Answer 3 · answered Jul 24 '23 at 15:02

3

As a one-liner:

with(df, integrate(\(x) +(abs(approxfun(time, values)(x) - 5) > 1.5), min(time), max(time)))
#> 4.5 with absolute error < 4.3e-14

answered Jul 24 '23 at 15:02

jblood94

10,340
1
10
15

score 2 · Answer 4 · answered Jul 24 '23 at 14:03

Inspired by the answer of @AllanCameron (thanks!) I come to the following short solution:

precition <- .001 # Choose how detailed the calculation (interpolation) is

values_interpolated <- approx(df$time, df$values, xout= seq(1, 10, precision))$y # interpolate

sum(values_interpolated < threshold_low | values_interpolated > threshold_high) * precision
4.499

score 1 · Answer 5 · answered Jul 24 '23 at 13:41

As suggested in the comments, if you decide to trust a purely linear fit between each pair of data points, then you simply need to calculate the intersection of the linear fit with your threshold for max or min range. That's a matter of basic linear equations.

$slope = (y_2 -y_1)/(x_2 - x_1)$ hmmm... whatever happened to mathjax?

intercept = y_2 - slope*x_2 (or y_1,x_1; doesn't matter).

You don't have to test every segment, just those for which one point exceeds the desired range. If both points exceed in the same direction, then that entire segment is out-of-range.

Stack Overflow doesn't have mathjax, unlike some of the other Stack Exchange sites — Allan Cameron, Jul 24 '23 at 13:50

How to find how long a number lays outside a range?

5 Answers5