I'm quite new to R, I use it mainly for visualising statistics using ggplot2
library. Now I have faced a problem with data preparation.
I need to write a function, that will remove some number (2, 5 or 10) rows from a data frame that have highest and lowest values in specified column and put them into another data frame, and do this for each combination of two factors (in my case: for each day and server).
Up to this point, I have done the following steps (MWE using esoph
example dataset).
I have sorted the frame according to the desired parameter (ncontrols
in example):
esoph<-esoph[with(esoph,order(-ncontrols)) ,]
I can display first/last records for each factor value (in this example for each age range):
by(data=esoph,INDICES=esoph$agegp,FUN=head,3)
by(data=esoph,INDICES=esoph$agegp,FUN=tail,3)
So basically, I can see the highest and lowest values, but I don't know how to extract them into another data frame and how to remove them from the main one.
Also in the above example I can see top/bottom records for each value of one factor (age range), but in reality I need to know highest and lowest records for each value of two factors -- in this example they could be agegp
and alcgp
.
I am not even sure if these above steps are OK - perhaps using plyr
would work better? I'd appreciate any hints.