I have a dataframe as follows:
chr leftPos TBGGT 12_try 324Gtt AMN2
1 24352 34 43 19 43
1 53534 2 1 -1 -9
2 34 -15 7 -9 -18
3 3443 -100 -4 4 -9
3 3445 -100 -1 6 -1
3 3667 5 -5 9 5
3 7882 -8 -9 1 3
I have to create a loop which:
a) Calculates the upper and lower limit (UL and LL) for each column from the third column onwards.
b) Only includes rows that fall outside of the UL and LL (Zoutliers).
c) Then count the number of rows where the Zoutlier is the same direction (i.e. positive or negative) as the previous or the subsequent row for the same chr.
The output would therefore be:
ZScore1 TBGGT 12_try 324Gtt AMN2
nrow 4 6 4 4
So far I have code as follows:
library(data.table)#v1.9.5
f1 <- function(df, ZCol){
#A) Determine the UL and LL and then generate the Zoutliers
UL = median(ZCol, na.rm = TRUE) + alpha*IQR(ZCol, na.rm = TRUE)
LL = median(ZCol, na.rm = TRUE) - alpha*IQR(ZCol, na.rm = TRUE)
Zoutliers <- which(ZCol > UL | ZCol < LL)
#B) Exclude Zoutliers per chr if same direction as previous or subsequent row
na.omit(as.data.table(df)[, {tmp = sign(eval(as.name(ZCol)))
.SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')]},
by=chr])[, list(.N)]}
nm1 <- paste0(names(df)
setnames(do.call(cbind,lapply(nm1, function(x) f1(df, x))), nm1)[]
The code is patched together from various places. The problem I have is combining parts A) and B) of the code to get the output I want