6

I can generate 20 observations of a uniform distribution with the runif function : runif(n=20) and 100 replicates of the same distribution as following.

df <- replicate( 100, runif(n=20))

This creates df a matrix of dimensions [20,100] which I can convert into a data frame with 100 columns and 20 rows.

How can I generate a new data frame consisting of the means of each column of df ?

Thank you for your help.

Wael
  • 1,640
  • 1
  • 9
  • 20
user1357062
  • 83
  • 2
  • 5

4 Answers4

11

You can use colMeans.

data <- replicate(100, runif(n=20))
means <- colMeans(data)
nico
  • 50,859
  • 17
  • 87
  • 112
  • 3
    R 2.15+ also includes `.colMeans()`. According to the note, these are "for use in programming where ultimate speed is required." – tim riffe Apr 25 '12 at 20:54
5

Generate data:

data <- replicate(100, runif(n=20))

Means of columns, rows:

col_mean <- apply(data, 2, mean)
row_mean <- apply(data, 1, mean)

Standard deviation of columns, rows

col_sd   <- apply(data, 2, sd)
row_sd   <- apply(data, 1, sd)
Community
  • 1
  • 1
Idr
  • 6,000
  • 6
  • 34
  • 49
  • 1
    `colMeans`, `rowMeans`, `colSums`, and `rowSums` will generally perform faster than their `apply` equivalents, though for *most* cases, the performance hit will not be a huge deal (obviously depends on the size of your data...). – Chase Apr 25 '12 at 20:11
  • check out the help page for `?colMeans` for details, but essentially those functions are "written for speed" and do less error checking than the `apply` functions. I wish I understood the details better myself... – Chase Apr 25 '12 at 20:17
  • On a 10000 x 10000 matrix `colMeans` took ~0.1s, `apply` ~3.2s. – nico Apr 25 '12 at 20:27
2

if i understand correctly: apply(replicate(100,runif(n=20)),2,mean)

frankc
  • 11,290
  • 4
  • 32
  • 49
2

Building off of Nico's answer, you could instead make one call to runif(), format it into a matrix, and then take the colMeans of that. It proves faster and is equivalent to the other answers.

library(rbenchmark)
#reasonably fast
f1 <- function() colMeans(replicate(100,runif(20)))
#faster yet
f2 <- function() colMeans(matrix(runif(20*100), ncol = 100))

benchmark(f1(), f2(), 
          order = "elapsed", 
          columns = c("test", "elapsed", "relative"),
          replications=10000)

#Test results
  test elapsed relative
2 f2()    0.91 1.000000
1 f1()    5.10 5.604396
Chase
  • 67,710
  • 18
  • 144
  • 161