In R, how to generate a dataset consisting of the means of all column of a dataframe?

Question

I can generate 20 observations of a uniform distribution with the runif function : runif(n=20) and 100 replicates of the same distribution as following.

df <- replicate( 100, runif(n=20))

This creates df a matrix of dimensions [20,100] which I can convert into a data frame with 100 columns and 20 rows.

How can I generate a new data frame consisting of the means of each column of df ?

Thank you for your help.

minor point: in R, they're functions, not commands! – Spacedman Apr 26 '12 at 07:29 — Spacedman, Apr 26 '12 at 07:29

score 11 · Answer 1 · answered Apr 25 '12 at 19:56

11

You can use colMeans.

data <- replicate(100, runif(n=20))
means <- colMeans(data)

answered Apr 25 '12 at 19:56

nico

50,859
17
87
112

3

R 2.15+ also includes `.colMeans()`. According to the note, these are "for use in programming where ultimate speed is required." – tim riffe Apr 25 '12 at 20:54

score 5 · Answer 2 · edited Jun 20 '20 at 09:12

5

Generate data:

data <- replicate(100, runif(n=20))

Means of columns, rows:

col_mean <- apply(data, 2, mean)
row_mean <- apply(data, 1, mean)

Standard deviation of columns, rows

col_sd   <- apply(data, 2, sd)
row_sd   <- apply(data, 1, sd)

edited Jun 20 '20 at 09:12

Community

1
1

answered Apr 25 '12 at 20:08

Idr

6,000
6
34
49

1

`colMeans`, `rowMeans`, `colSums`, and `rowSums` will generally perform faster than their `apply` equivalents, though for *most* cases, the performance hit will not be a huge deal (obviously depends on the size of your data...). – Chase Apr 25 '12 at 20:11
check out the help page for `?colMeans` for details, but essentially those functions are "written for speed" and do less error checking than the `apply` functions. I wish I understood the details better myself... – Chase Apr 25 '12 at 20:17
On a 10000 x 10000 matrix `colMeans` took ~0.1s, `apply` ~3.2s. – nico Apr 25 '12 at 20:27

score 2 · Answer 3 · answered Apr 25 '12 at 19:54

2

if i understand correctly: apply(replicate(100,runif(n=20)),2,mean)

answered Apr 25 '12 at 19:54

frankc

11,290
4
32
49

Dear frankc: Thank you very much for your help- I tried your suggestion and it indeed worked like a charm. – user1357062 Apr 25 '12 at 20:01

score 2 · Answer 4 · answered Apr 25 '12 at 22:25

Building off of Nico's answer, you could instead make one call to runif(), format it into a matrix, and then take the colMeans of that. It proves faster and is equivalent to the other answers.

library(rbenchmark)
#reasonably fast
f1 <- function() colMeans(replicate(100,runif(20)))
#faster yet
f2 <- function() colMeans(matrix(runif(20*100), ncol = 100))

benchmark(f1(), f2(), 
          order = "elapsed", 
          columns = c("test", "elapsed", "relative"),
          replications=10000)

#Test results
  test elapsed relative
2 f2()    0.91 1.000000
1 f1()    5.10 5.604396

In R, how to generate a dataset consisting of the means of all column of a dataframe?

4 Answers4

Generate data:

Means of columns, rows:

Standard deviation of columns, rows