I'd like to take an increasing sequence of numbers (e.g. a series of times)
set.seed(41); d <- seq(1:100) + runif(100, 0, 1)
and if the difference between two sequential numbers is below a threshold, merge them into a single point by taking the mean value of the two and, then continue going through until the next time combining is necessary. I resorted to functions I usually avoid: while
and ifelse
to write a quick-and-dirty function, and it works but isn't fast. Can you solve this task 1) more efficiently and 2) without invoking a for or while loop. Is there some built-in function, perhaps with even more functionality, that is well-suited for such a task?
combine_points <- function(x, th=0.5)
{
i = 1 # start i at 1
while(min(diff(x)) < th) # initiate while loop
{
ifelse(x[i+1] - x[i] < th, # logical condition
x[i] <- x[i+1] <-
mean(c(x[i+1], x[i])), # assignment if TRUE
(x[i] <- x[i])) # assignment if FALSE
x <- sort(unique(x)) # get rid of the duplicated entry created when
# the ifelse statement was TRUE
# increment i or reset i to 1 if it gets too large
ifelse(i == length(x), i <- 1, i <- i+1 )
}
return(x)
}
newd <- combine_points(d)
th <- 0.5
which(diff(newd) < th)
integer(0)
Update to benchmarks of solutions so far.
I benchmarked with a larger sample vector, and the Rcpp solution suggested by @Roland is slower than my first while loop when the vector gets long. I made an improvement to the initial while loop, and made an Rcpp version of it, too. The benchmark results are below. Note that @flodel answer is not directly comparable because it is a fundamentally different approach to combining, but it is definitely very fast.
set.seed(41); d <- seq(1:4000) + runif(4000, 0, 1)
library(microbenchmark)
microbenchmark(
combine_points.Frank(d,th=0.5),
combine_points.Frank2(d,th=0.5),
combine_points_Roland(d,th=0.5),
combine_points_Roland2(d,th=0.5))
Unit: milliseconds
expr min lq median uq max neval
combine_points.Frank(d, th = 0.5) 2115.6391 2154.5038 2174.5889 2193.8444 7884.1638 100
combine_points.Frank2(d, th = 0.5) 1298.2923 1323.2214 1341.5357 1357.4260 15538.0872 100
combine_points_Roland(d, th = 0.5) 2497.9106 2506.5960 2512.3591 2519.0036 2573.2854 100
combine_points_Roland2(d, th = 0.5) 494.8406 497.3613 498.2347 499.8777 544.9743 100
This is a considerable improvement over my first attempt, and the following is an Rcpp version, which is the fastest, so far:
combine_points.Frank2 <- function(x, th=0.5)
{
i = 1
while(min(diff(x)) < th)
{
if(x[i+1] - x[i] >= th){
i <- i + 1}
else {
x[i] <- x[i+1] <-
mean(c(x[i+1], x[i]));x <- unique(x); i <- i }
}
return(x)
}
Rcpp version
cppFunction('
NumericVector combine_points_Roland2(NumericVector x, double th) {
int i=0;
while(min(diff(x)) < th)
{
if ((x[i+1] - x[i]) >= th)
{
i = i + 1;
}
else{
x[i] = (x[i+1] + x[i])/2;
x[i+1] = x[i];
x = sort_unique(x);
i = i;
}
}
return x;
}
')