dplyr
is not optimized for row-based calculations like that. Though you can do this with rowwise()
, I recommend against it: performance will be abysmal. Your best speed will likely be with something that expects a matrix
, and can operate on the rows. I suggest apply
.
Instead of dealing with a 100x1000 data.frame
, for brevity I'll go with 5 columns:
set.seed(2)
mydf <- as.data.frame(lapply(seq(1, 5), rnorm, n=10))
colnames(mydf) <- paste('s', seq(1, ncol(mydf)), sep='')
Converting to a matrix
is only reasonable if all columns are of the same class
. In this case, they are all numeric
so we are safe. (If you have non-numeric columns in the dataframe, extract only the ones you need here and bind them back in later.)
mymtx <- as.matrix(mydf)
apply(mymtx, 1, quantile, c(0.1, 0.9))
# [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
# 10% 1.028912 1.430939 1.999521 0.305907 1.753824 0.03267599 1.934381 1.270504 2.995816 1.489634
# 90% 4.950067 3.807735 4.881554 6.123989 4.886388 5.55628806 4.207605 4.184460 4.406384 3.782134
One notable with using apply
like this is that the result is in row-based form, perhaps transposed from what one would expect. Simply wrap it in t(...)
and you'll see the columns you might expect.
This can be recombined with the original dataframe using cbind
or similar function.
This can be done in a pipeline like so:
mydf %>%
bind_cols(as.data.frame(t(apply(., 1, quantile, c(0.1, 0.9)))))
# s1 s2 s3 s4 s5 10% 90%
# 1 0.1030855 2.4176508 5.0908192 4.738939 4.616414 1.02891157 4.950067
# 2 1.1848492 2.9817528 1.8000742 4.318960 3.040897 1.43093918 3.807735
# 3 2.5878453 1.6073046 4.5896382 5.076164 4.158295 1.99952092 4.881554
# 4 -0.1303757 0.9603310 4.9546516 3.715842 6.903547 0.30590700 6.123989
# 5 0.9197482 3.7822290 3.0049378 3.223325 5.622494 1.75382406 4.886388
# 6 1.1324203 -0.3110691 0.5482936 3.404340 6.990920 0.03267599 5.556288
# 7 1.7079547 2.8786046 3.4772373 2.274020 4.694516 1.93438093 4.207605
# 8 0.7603020 2.0358067 2.4034418 3.097416 4.909156 1.27050387 4.184460
# 9 2.9844739 3.0128287 3.7922033 3.440938 4.815839 2.99581584 4.406384
# 10 0.8612130 2.4322652 3.2896367 3.753487 3.801232 1.48963385 3.782134
I'll leave the column naming up to you.