1
 DT <- data.table(
 N = 1:16,
 x = c(11,11,11,11,11,11,11,11,21,21,21,21,21,21,21,21), 
 y = rep(1:4,4,4,4,1:4,4,4,4), 
 z = c(53,71,27,64,43,62,61,85,44,56,23,37,31,48,80,38),
 min = c(2.74,2.77,2.23,2.98,2.25,2.48,2.46,2.22,3.07,
        3.08,3.81,3.31,3.32,3.75,3.28,3.04))

I want to get data.table with filtration using upper.limit and lowerlimit

   N    x   y   z   min     lower.limit upper.limit
   1    11  1   53  2.74        2.6     2.88
   2    11  2   71  2.77        2.63    2.91
   3    11  3   27  2.23        2.12    2.34
   4    11  4   64  2.98        2.83    3.13
   5    11  4   43  2.25        2.14    2.36
   6    11  4   62  2.48        2.36    2.6
   7    11  4   61  2.46        2.34    2.58
   8    11  4   85  2.22        2.11    2.33
   9    21  1   44  3.07        2.92    3.22
  10    21  2   56  3.08        2.93    3.23
  11    21  3   23  3.81        3.62    4
  12    21  4   37  3.31        3.14    3.48
  13    21  4   31  3.32        3.15    3.49
  14    21  4   48  3.75        3.56    3.94
  15    21  4   80  3.28        3.12    3.44
  16    21  4   38  3.04        2.89    3.19

I have to filter the data.table based on a min filter for example when N=2, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous row that row should be eliminated. when N=3, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of any of the previous 2 rows those rows should be eliminated. when N=4, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous 3 rows the row should be eliminated. when N=5,6,7,8, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous 4 rows the rows should be eliminated. then I have to find rolling mean of the data set

library(zoo)
library(data.table)
DT[, RollingAvg := shift(rollapply(z, 4, mean, 
      partial = TRUE, align = 'right')), by = x]

How can I apply this min filter along with this rolling function for unique values of x here. can we apply the min filter using the rollapply function?

I usually do like this

N   x   y   z   min l.min   up.limit    N_2 x_2 y_2 z_2 min_2   status
2   11  2   71  2.77    2.63    2.91    1   11  1   53  2.74    TRUE
3   11  3   27  2.23    2.12    2.34    1   11  1   53  2.74    FALSE
3   11  3   27  2.23    2.12    2.34    2   11  2   71  2.77    FALSE
4   11  4   64  2.98    2.83    3.13    1   11  1   53  2.74    FALSE
4   11  4   64  2.98    2.83    3.13    2   11  2   71  2.77    FALSE
4   11  4   64  2.98    2.83    3.13    3   11  3   27  2.23    FALSE
5   11  4   43  2.25    2.14    2.36    1   11  1   53  2.74    FALSE
5   11  4   43  2.25    2.14    2.36    2   11  2   71  2.77    FALSE
5   11  4   43  2.25    2.14    2.36    3   11  3   27  2.23    TRUE
5   11  4   43  2.25    2.14    2.36    4   11  4   64  2.98    FALSE
6   11  4   62  2.48    2.36    2.6     2   11  2   71  2.77    FALSE
6   11  4   62  2.48    2.36    2.6     3   11  3   27  2.23    FALSE
6   11  4   62  2.48    2.36    2.6     4   11  4   64  2.98    FALSE
6   11  4   62  2.48    2.36    2.6     5   11  4   43  2.25    FALSE
7   11  4   61  2.46    2.34    2.58    3   11  3   27  2.23    FALSE
7   11  4   61  2.46    2.34    2.58    4   11  4   64  2.98    FALSE
7   11  4   61  2.46    2.34    2.58    5   11  4   43  2.25    FALSE
7   11  4   61  2.46    2.34    2.58    6   11  4   62  2.48    TRUE
8   11  4   85  2.22    2.11    2.33    4   11  4   64  2.98    FALSE
8   11  4   85  2.22    2.11    2.33    5   11  4   43  2.25    TRUE
8   11  4   85  2.22    2.11    2.33    6   11  4   62  2.48    FALSE
8   11  4   85  2.22    2.11    2.33    7   11  4   61  2.46    FALSE

then sort out all l.limit <= min_2 & up.limit >= min_2 true(this shows in status column) so we will get

N   x   y   z   min l.min   up.limit    N_2 x_2 y_2 z_2 min_2   status
2   11  2   71  2.77    2.63    2.91    1   11  1   53  2.74    TRUE
5   11  4   43  2.25    2.14    2.36    3   11  3   27  2.23    TRUE
7   11  4   61  2.46    2.34    2.58    6   11  4   62  2.48    TRUE
8   11  4   85  2.22    2.11    2.33    5   11  4   43  2.25    TRUE

then find aggregate average of z_2 by the column N gives the expected output
as

N   x   y   z   min l.min   up.limit    Roll_avg
2   11  2   71  2.77    2.63    2.91    53
5   11  4   43  2.25    2.14    2.36    27
7   11  4   61  2.46    2.34    2.58    62
8   11  4   85  2.22    2.11    2.33    43

this should be repeated all unique value of "x" This computation is too costly. How can I do this using rollapply function in R? or is there any less costly method to do this in R?

  • Your examples are giving errors. Please check – akrun Aug 24 '22 at 18:14
  • In your example, all the min are less than the upper.limit therefore nothing to drop – Onyambu Aug 24 '22 at 18:16
  • Also do you mean to say *less than ANY of the previous* or *less than ALL of the previous* – Onyambu Aug 24 '22 at 18:17
  • @onyambu "all" of the previous values – zainul abid Aug 24 '22 at 18:22
  • Well you still have to give an example that will work,. In your given example, all the min are less than upper.limit. Nothing to drop\ – Onyambu Aug 24 '22 at 18:24
  • For example when N=3 ,min=2.23 and the previous row has upper.limit=2.91 and lower.limit= 2.63 that the second row should be eliminated while finding rolling mean – zainul abid Aug 24 '22 at 18:30
  • Why eliminate the second row? 2.23 is not greater than 2.91. You said we eliminate if it is greater. Your statements are confusing. Include the desired output – Onyambu Aug 24 '22 at 20:08
  • Your *logic* and your last code block are inconsistent. Per your data and the determination that a row is kept if and only if its `min` value is between all `lower.limit` and `upper.limit` over the three previous rows, only rows 1-2 are retained (and not row 1 if you cannot compare it to anything). – r2evans Aug 25 '22 at 15:36

1 Answers1

3

Perhaps this?

DT[, .SD[zoo::rollapplyr(.I, 4, FUN = function(i) if (length(i) == 1) TRUE else all(head(between(min[last(i)], lower.limit[i], upper.limit[i]),n=-1)),
                         partial=TRUE),]]
#        N     x     y     z   min lower.limit upper.limit
#    <int> <int> <int> <int> <num>       <num>       <num>
# 1:     1    11     1    53  2.74        2.60        2.88
# 2:     2    11     2    71  2.77        2.63        2.91

If you don't want to keep row 1, then the one TRUE to FALSE.

r2evans
  • 141,215
  • 6
  • 77
  • 149