DT <- data.table(
N = 1:16,
x = c(11,11,11,11,11,11,11,11,21,21,21,21,21,21,21,21),
y = rep(1:4,4,4,4,1:4,4,4,4),
z = c(53,71,27,64,43,62,61,85,44,56,23,37,31,48,80,38),
min = c(2.74,2.77,2.23,2.98,2.25,2.48,2.46,2.22,3.07,
3.08,3.81,3.31,3.32,3.75,3.28,3.04))
I want to get data.table with filtration using upper.limit and lowerlimit
N x y z min lower.limit upper.limit
1 11 1 53 2.74 2.6 2.88
2 11 2 71 2.77 2.63 2.91
3 11 3 27 2.23 2.12 2.34
4 11 4 64 2.98 2.83 3.13
5 11 4 43 2.25 2.14 2.36
6 11 4 62 2.48 2.36 2.6
7 11 4 61 2.46 2.34 2.58
8 11 4 85 2.22 2.11 2.33
9 21 1 44 3.07 2.92 3.22
10 21 2 56 3.08 2.93 3.23
11 21 3 23 3.81 3.62 4
12 21 4 37 3.31 3.14 3.48
13 21 4 31 3.32 3.15 3.49
14 21 4 48 3.75 3.56 3.94
15 21 4 80 3.28 3.12 3.44
16 21 4 38 3.04 2.89 3.19
I have to filter the data.table based on a min filter for example when N=2, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous row that row should be eliminated. when N=3, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of any of the previous 2 rows those rows should be eliminated. when N=4, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous 3 rows the row should be eliminated. when N=5,6,7,8, if "min" of that row is greater than or equal to upper.limit and less than or equal to the lower.limit of previous 4 rows the rows should be eliminated. then I have to find rolling mean of the data set
library(zoo)
library(data.table)
DT[, RollingAvg := shift(rollapply(z, 4, mean,
partial = TRUE, align = 'right')), by = x]
How can I apply this min filter along with this rolling function for unique values of x here. can we apply the min filter using the rollapply function?
I usually do like this
N x y z min l.min up.limit N_2 x_2 y_2 z_2 min_2 status
2 11 2 71 2.77 2.63 2.91 1 11 1 53 2.74 TRUE
3 11 3 27 2.23 2.12 2.34 1 11 1 53 2.74 FALSE
3 11 3 27 2.23 2.12 2.34 2 11 2 71 2.77 FALSE
4 11 4 64 2.98 2.83 3.13 1 11 1 53 2.74 FALSE
4 11 4 64 2.98 2.83 3.13 2 11 2 71 2.77 FALSE
4 11 4 64 2.98 2.83 3.13 3 11 3 27 2.23 FALSE
5 11 4 43 2.25 2.14 2.36 1 11 1 53 2.74 FALSE
5 11 4 43 2.25 2.14 2.36 2 11 2 71 2.77 FALSE
5 11 4 43 2.25 2.14 2.36 3 11 3 27 2.23 TRUE
5 11 4 43 2.25 2.14 2.36 4 11 4 64 2.98 FALSE
6 11 4 62 2.48 2.36 2.6 2 11 2 71 2.77 FALSE
6 11 4 62 2.48 2.36 2.6 3 11 3 27 2.23 FALSE
6 11 4 62 2.48 2.36 2.6 4 11 4 64 2.98 FALSE
6 11 4 62 2.48 2.36 2.6 5 11 4 43 2.25 FALSE
7 11 4 61 2.46 2.34 2.58 3 11 3 27 2.23 FALSE
7 11 4 61 2.46 2.34 2.58 4 11 4 64 2.98 FALSE
7 11 4 61 2.46 2.34 2.58 5 11 4 43 2.25 FALSE
7 11 4 61 2.46 2.34 2.58 6 11 4 62 2.48 TRUE
8 11 4 85 2.22 2.11 2.33 4 11 4 64 2.98 FALSE
8 11 4 85 2.22 2.11 2.33 5 11 4 43 2.25 TRUE
8 11 4 85 2.22 2.11 2.33 6 11 4 62 2.48 FALSE
8 11 4 85 2.22 2.11 2.33 7 11 4 61 2.46 FALSE
then sort out all l.limit <= min_2 & up.limit >= min_2 true(this shows in status column) so we will get
N x y z min l.min up.limit N_2 x_2 y_2 z_2 min_2 status
2 11 2 71 2.77 2.63 2.91 1 11 1 53 2.74 TRUE
5 11 4 43 2.25 2.14 2.36 3 11 3 27 2.23 TRUE
7 11 4 61 2.46 2.34 2.58 6 11 4 62 2.48 TRUE
8 11 4 85 2.22 2.11 2.33 5 11 4 43 2.25 TRUE
then find aggregate average of z_2 by the column N gives the expected output
as
N x y z min l.min up.limit Roll_avg
2 11 2 71 2.77 2.63 2.91 53
5 11 4 43 2.25 2.14 2.36 27
7 11 4 61 2.46 2.34 2.58 62
8 11 4 85 2.22 2.11 2.33 43
this should be repeated all unique value of "x" This computation is too costly. How can I do this using rollapply function in R? or is there any less costly method to do this in R?