Applying filter using rollapply in R

Question

 DT <- data.table(
 N = 1:16,
 x = c(11,11,11,11,11,11,11,11,21,21,21,21,21,21,21,21), 
 y = rep(1:4,4,4,4,1:4,4,4,4), 
 z = c(53,71,27,64,43,62,61,85,44,56,23,37,31,48,80,38),
 min = c(2.74,2.77,2.23,2.98,2.25,2.48,2.46,2.22,3.07,
        3.08,3.81,3.31,3.32,3.75,3.28,3.04))

I want to get data.table with filtration using upper.limit and lowerlimit

   N    x   y   z   min     lower.limit upper.limit
   1    11  1   53  2.74        2.6     2.88
   2    11  2   71  2.77        2.63    2.91
   3    11  3   27  2.23        2.12    2.34
   4    11  4   64  2.98        2.83    3.13
   5    11  4   43  2.25        2.14    2.36
   6    11  4   62  2.48        2.36    2.6
   7    11  4   61  2.46        2.34    2.58
   8    11  4   85  2.22        2.11    2.33
   9    21  1   44  3.07        2.92    3.22
  10    21  2   56  3.08        2.93    3.23
  11    21  3   23  3.81        3.62    4
  12    21  4   37  3.31        3.14    3.48
  13    21  4   31  3.32        3.15    3.49
  14    21  4   48  3.75        3.56    3.94
  15    21  4   80  3.28        3.12    3.44
  16    21  4   38  3.04        2.89    3.19

I have to filter the data.table based on a min filter for example when N=2, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous row that row should be eliminated. when N=3, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of any of the previous 2 rows those rows should be eliminated. when N=4, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous 3 rows the row should be eliminated. when N=5,6,7,8, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous 4 rows the rows should be eliminated. then I have to find rolling mean of the data set

library(zoo)
library(data.table)
DT[, RollingAvg := shift(rollapply(z, 4, mean, 
      partial = TRUE, align = 'right')), by = x]

How can I apply this min filter along with this rolling function for unique values of x here. can we apply the min filter using the rollapply function?

I usually do like this

N   x   y   z   min l.min   up.limit    N_2 x_2 y_2 z_2 min_2   status
2   11  2   71  2.77    2.63    2.91    1   11  1   53  2.74    TRUE
3   11  3   27  2.23    2.12    2.34    1   11  1   53  2.74    FALSE
3   11  3   27  2.23    2.12    2.34    2   11  2   71  2.77    FALSE
4   11  4   64  2.98    2.83    3.13    1   11  1   53  2.74    FALSE
4   11  4   64  2.98    2.83    3.13    2   11  2   71  2.77    FALSE
4   11  4   64  2.98    2.83    3.13    3   11  3   27  2.23    FALSE
5   11  4   43  2.25    2.14    2.36    1   11  1   53  2.74    FALSE
5   11  4   43  2.25    2.14    2.36    2   11  2   71  2.77    FALSE
5   11  4   43  2.25    2.14    2.36    3   11  3   27  2.23    TRUE
5   11  4   43  2.25    2.14    2.36    4   11  4   64  2.98    FALSE
6   11  4   62  2.48    2.36    2.6     2   11  2   71  2.77    FALSE
6   11  4   62  2.48    2.36    2.6     3   11  3   27  2.23    FALSE
6   11  4   62  2.48    2.36    2.6     4   11  4   64  2.98    FALSE
6   11  4   62  2.48    2.36    2.6     5   11  4   43  2.25    FALSE
7   11  4   61  2.46    2.34    2.58    3   11  3   27  2.23    FALSE
7   11  4   61  2.46    2.34    2.58    4   11  4   64  2.98    FALSE
7   11  4   61  2.46    2.34    2.58    5   11  4   43  2.25    FALSE
7   11  4   61  2.46    2.34    2.58    6   11  4   62  2.48    TRUE
8   11  4   85  2.22    2.11    2.33    4   11  4   64  2.98    FALSE
8   11  4   85  2.22    2.11    2.33    5   11  4   43  2.25    TRUE
8   11  4   85  2.22    2.11    2.33    6   11  4   62  2.48    FALSE
8   11  4   85  2.22    2.11    2.33    7   11  4   61  2.46    FALSE

then sort out all l.limit <= min_2 & up.limit >= min_2 true(this shows in status column) so we will get

N   x   y   z   min l.min   up.limit    N_2 x_2 y_2 z_2 min_2   status
2   11  2   71  2.77    2.63    2.91    1   11  1   53  2.74    TRUE
5   11  4   43  2.25    2.14    2.36    3   11  3   27  2.23    TRUE
7   11  4   61  2.46    2.34    2.58    6   11  4   62  2.48    TRUE
8   11  4   85  2.22    2.11    2.33    5   11  4   43  2.25    TRUE

then find aggregate average of z_2 by the column N gives the expected output
as

N   x   y   z   min l.min   up.limit    Roll_avg
2   11  2   71  2.77    2.63    2.91    53
5   11  4   43  2.25    2.14    2.36    27
7   11  4   61  2.46    2.34    2.58    62
8   11  4   85  2.22    2.11    2.33    43

this should be repeated all unique value of "x" This computation is too costly. How can I do this using rollapply function in R? or is there any less costly method to do this in R?

In your example, all the min are less than the upper.limit therefore nothing to drop — Onyambu, Aug 24 '22 at 18:16
Also do you mean to say *less than ANY of the previous* or *less than ALL of the previous* — Onyambu, Aug 24 '22 at 18:17
Well you still have to give an example that will work,. In your given example, all the min are less than upper.limit. Nothing to drop\ — Onyambu, Aug 24 '22 at 18:24
For example when N=3 ,min=2.23 and the previous row has upper.limit=2.91 and lower.limit= 2.63 that the second row should be eliminated while finding rolling mean — zainul abid, Aug 24 '22 at 18:30
Why eliminate the second row? 2.23 is not greater than 2.91. You said we eliminate if it is greater. Your statements are confusing. Include the desired output — Onyambu, Aug 24 '22 at 20:08
Your *logic* and your last code block are inconsistent. Per your data and the determination that a row is kept if and only if its `min` value is between all `lower.limit` and `upper.limit` over the three previous rows, only rows 1-2 are retained (and not row 1 if you cannot compare it to anything). — r2evans, Aug 25 '22 at 15:36

r2evans · Answer 1 · 2022-08-25T15:37:26.627

3

Perhaps this?

DT[, .SD[zoo::rollapplyr(.I, 4, FUN = function(i) if (length(i) == 1) TRUE else all(head(between(min[last(i)], lower.limit[i], upper.limit[i]),n=-1)),
                         partial=TRUE),]]
#        N     x     y     z   min lower.limit upper.limit
#    <int> <int> <int> <int> <num>       <num>       <num>
# 1:     1    11     1    53  2.74        2.60        2.88
# 2:     2    11     2    71  2.77        2.63        2.91

If you don't want to keep row 1, then the one TRUE to FALSE.

edited Aug 25 '22 at 15:37

answered Aug 24 '22 at 19:58

r2evans

141,215
6
77
149

I have reconstructed the question. Could you help me with this – zainul abid Aug 25 '22 at 02:01

Applying filter using rollapply in R

1 Answers1