0

I am having some difficulty transposing my data correctly. I am trying to get a list of column's mean and sd where the column names are now rows. I was able to create the means and sd with the code below:

data(iris)

mydata <- do.call(data.frame, aggregate(. ~ Species, iris, function(x) c(mean = mean(x), sd = sd(x))))

Creating the Table:

<table><tbody><tr><th>Species</th><th>Sepal.Length.mean</th><th>Sepal.Length.sd</th><th>Sepal.Width.mean</th><th>Sepal.Width.sd</th><th>Petal.Length.mean</th><th>Petal.Length.sd</th><th>Petal.Width.mean</th><th>Petal.Width.sd</th></tr><tr><td>setosa</td><td>5.006</td><td>0.3524897</td><td>3.428</td><td>0.3790644</td><td>1.462</td><td>0.173664</td><td>0.246</td><td>0.1053856</td></tr><tr><td>versicolor</td><td>5.936</td><td>0.5161711</td><td>2.77</td><td>0.3137983</td><td>4.26</td><td>0.469911</td><td>1.326</td><td>0.1977527</td></tr><tr><td>virginica</td><td>6.588</td><td>0.6358796</td><td>2.974</td><td>0.3224966</td><td>5.552</td><td>0.5518947</td><td>2.026</td><td>0.27</td></tr></tbody></table>

I would like the table to look like the following:

<table><tbody><tr><th> </th><th>Setosa</th><th> </th><th>Versicolor</th><th> </th><th>Virginica</th><th> </th></tr><tr><td> </td><td>Mean</td><td>SD</td><td>Mean</td><td>SD</td><td>Mean</td><td>SD</td></tr><tr><td>Sepal.Length</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr><tr><td>Sepal.Width</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr><tr><td>Petal.Length</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr><tr><td>Petal.Width</td><td> </td><td> </td><td> </td><td> </td><td> </td><td> </td></tr></tbody></table>

I realize the getting the second header would most likely require the add_header_above function in kable, but before I get there I am having some difficulty structuring the dataframe to what I would like. I have been fiddling around with the cast and melt function with little luck.

Any advice would be much appreciated!

~Jack

Jbnimble
  • 39
  • 7
  • first of all why are you doing a `do.call` with `aggregate`?? No need for the `d`o.call` since the result from `aggregate` is a dataframe already – Onyambu Jan 04 '18 at 19:46
  • The do.call was suggested [link](https://stackoverflow.com/questions/16367436/compute-mean-and-standard-deviation-by-group-for-multiple-variables-in-a-data-fr) because the second causes the aggregation to make matrices. Without the do.call, the sd columns to not show up in the data.frame – Jbnimble Jan 04 '18 at 20:01
  • well on my end it does show up – Onyambu Jan 04 '18 at 20:06

2 Answers2

1

Here is a solution with the tidyverse and the tables package. First, we use gather() to make a narrow format tidy data set. The narrow format allows us to use both Species and flowerAttribute as factor variables in the table, as well as eliminating the need to transpose the data.

Second, we use the tables::tabular() function to generate a table that has the Species means and standard deviations on the column dimension, and the flower attributes on the row dimension.

data(iris)
library(tables)
library(tidyverse)
tidyIris <- gather(iris,key=flowerAttribute,value=value,
                 Sepal.Length,Sepal.Width,Petal.Length,Petal.Width)
# factors required for tabular()
tidyIris$flowerAttribute <- as.factor(tidyIris$flowerAttribute)
tabular((flowerAttribute) ~ Format(digits=2)*(Species)*(value)*(mean + sd), 
       data=tidyIris )

...and the output:

> tabular((flowerAttribute) ~ Format(digits=2)*(Species)*(value)*(mean + sd), 
+         data=tidyIris )

                 Species                                    
                 setosa       versicolor      virginica     
                 value        value           value         
 flowerAttribute mean    sd   mean       sd   mean      sd  
 Petal.Length    1.46    0.17 4.26       0.47 5.55      0.55
 Petal.Width     0.25    0.11 1.33       0.20 2.03      0.27
 Sepal.Length    5.01    0.35 5.94       0.52 6.59      0.64
 Sepal.Width     3.43    0.38 2.77       0.31 2.97      0.32

For those of you who have used SAS before, the tables package implements functionality similar to SAS PROC TABULATE.

Enhancing the output

With some tweaks to the code, we can exactly replicate the output format requested in the OP.

# key syntax elements
# 1. - renamed flowerAttribute to Attribute using = operator
# 2. - used Heading() to eliminate the printing of "value" and "Species" on columns
tabular((Attribute=flowerAttribute) ~ Format(digits=2)*(Heading()*Species)*Heading()*(value)*(mean + sd), 
        data=tidyIris )

...and the output:

              setosa       versicolor      virginica     
 Attribute    mean    sd   mean       sd   mean      sd  
 Petal.Length 1.46    0.17 4.26       0.47 5.55      0.55
 Petal.Width  0.25    0.11 1.33       0.20 2.03      0.27
 Sepal.Length 5.01    0.35 5.94       0.52 6.59      0.64
 Sepal.Width  3.43    0.38 2.77       0.31 2.97      0.32
 > 

Generating LaTeX

Finally, to get typeset quality output, one can use tabular() to write LaTeX code that can be compiled into a PDF document with Sweave.

latex(tabular((Attribute=flowerAttribute) ~ Format(digits=2)*(Heading()*Species)*Heading()*(value)*(mean + sd), 
        data=tidyIris ))

...geneates LaTeX that compiles into:

enter image description here

Len Greski
  • 10,505
  • 2
  • 22
  • 33
0

I guess you are looking for this?

  `colnames<-`(do.call(rbind,by(t(mydata[-1]),rep(names(iris[-5]),each=2),unlist)),rep(c("Mean","Sd"),3))
              Mean        Sd  Mean        Sd  Mean        Sd
Petal.Length 1.462 0.1736640 4.260 0.4699110 5.552 0.5518947
Petal.Width  0.246 0.1053856 1.326 0.1977527 2.026 0.2746501
Sepal.Length 5.006 0.3524897 5.936 0.5161711 6.588 0.6358796
Sepal.Width  3.428 0.3790644 2.770 0.3137983 2.974 0.3224966

First since I am only dealing with the numeric columns I got rid of the Species column by iris[-5]. Also Since I did not need the first column of mydata I got rid of it. Why did I repeat twice? There were two functions. Why did I repeate 3times, there were three species...

Onyambu
  • 67,392
  • 3
  • 24
  • 53