I have a data frame with 431 variables and 140 observations and I need to remove outliers. However this dataset has several NA values, and I do not want to remove all rows with NAs. I am trying to do this outlier removal by IQR method, and so far, I've been able to obtain quartiles and IQR by the following:
data <- df2[,4:434]
apply(data,2,quantile, probs=c(0.25,0.75), na.rm=TRUE) -> Quartiles
sapply(data,IQR, na.rm=TRUE) -> iqr
I've also calculated the lower and upper values for each of my columns:
Lower <- Quartiles[1,]-1.5*iqr
Upper <- Quartiles[2,]+1.5*iqr
However, when I have tried to replace the outliers by NAs, no change has been observed in my data frame:
data_no_outlier <- replace(data, data[1:431] < Lower & data[1:431] > Upper, NA)
I have also tried to use this script to the iris data with the same unsuccessful result:
data(iris, package = "datasets")
completeData <- iris[-5]
apply(completeData,2,quantile, probs=c(0.25,0.75), na.rm=TRUE) -> Quartiles
sapply(completeData,IQR, na.rm=TRUE) -> iqr
Lower <- Quartiles[1,]-1.5*iqr
Upper <- Quartiles[2,]+1.5*iqr
data_no_outlier <- replace(completeData, completeData < Lower & completeData > Upper, NA)
Is there any way I can filter out outliers from my data, that does not require to manually select all the columns by name?