I have written a function which takes a subset of data based on the value of name column.It Computes the outlier for column "mark" and replaces all the outliers. However when I try to combine these different subsets, the order of my elements changes. Is there any way by which I can maintain the order of my elements in the column "mark"
My data set is:
name mark
A 100.0
B 0.5
C 100.0
A 50.0
B 90.0
B 1000.0
C 1200.0
C 5000.0
A 210.0
The function which I have written is :
data.frame(do.call("rbind", as.list(by(data, data$name,
function(x){apply(x[, .(mark)],2,
function(y) {y[y > (quantile(x$mark, na.rm=TRUE)[[3]][[1]] + 1.5 * IQR(x$mark))]
<- (quantile(x$mark, na.rm=TRUE)[[3]][[1]] + 1.5 * IQR(x$mark));y})}))))
The result of the above function is the first column below (I've manually added back name
for illustratory purposes):
mark NAME
100.000 ----- A
50.000 ----- A
210.000 ----- A
0.500 ----- B
90.000 ----- B
839.625 ----- B
100.000 ----- C
1200.000 ----- C
4875.000 ----- C
In the above result, the order of the values for mark column are changed. Is there any way by which I can maintain the order of the elements ?