My data looks as follows
data <- data.frame(
A = c(10,20,30,40,50,60,70,80,90,100),
B = c(110,120,130,140,150,160,170,180,190,200)
)
I wish to create a new column where row-wise I take each value from column A and concatenate it with the corresponding value from B as well as the lagged 3 values from column B and stores it in a numeric vector.
The final data will look as follows:
new_data <- data.frame(
A = c(10,20,30,40,50,60,70,80,90,100),
B = c(110,120,130,140,150,160,170,180,190,200),
c = I(list(c(10,110,NA,NA,NA),c(20,120,110,NA,NA),c(30,130,120,110,NA),c(40,140,130,120,110),c(50,150,140,130,120),c(60,160,150,140,130),c(70,170,160,150,140),c(80,180,170,160,150),c(90,190,180,170,160),c(100,200,190,180,170))
))
In my real data, I need to capture up to 200 lagged values so manually typing list(A, lag(B),lag(B,2),lag(B,3)....
is not a road I want to go down.
I've tried using purrr::map to achieve my results but I can't get the lagged values into my vector
data %>%
rowwise() %>%
mutate(z = list(c(A,B,map_vec(1:3, ~lag(B,.x, default = NA)))))
The ultimate goal is to calculate the percentage_rank of each value of column A relative to the current row + 3 previous values from column B to identify whether each point of A is an outlier i.e. > 75th percentile.
My approach is to create a numeric vector of values in column C that contains the current value of A and the 4 values from B, apply dplyr::percent_rank to each vector and pluck the percent rank for each index = 1 value (column A value) to get me the percent rank of value A.
Any guidance you can provide would be greatly appreciated.