15

Using the sample data (bottom), I want to use the code below to group and summarise the data. After this, I want to transpose, but I'm stuck on how to use tidyr to achieve this?

For context, I'm attempting to recreate an existing table that was created in Excel using knitr::kable, so the final product of my code below is expected to break tidy principles.

For example:

library(tidyverse)

Df <- Df %>% group_by(Code1, Code2, Level) %>%
    summarise_all(funs(count = sum(!is.na(.))))

I can add t(.) using the pipe...

Df <- Df %>% group_by(Code1, Code2, Level) %>%
    summarise_all(funs(count = sum(!is.na(.)))) %>%
    t(.)

or I can add...

Df <- as.data.frame(t(Df)

Both of these options allow me to transpose, but I'm wondering if there's a tidyverse method of achieving this using tidyr's gather and spread functions? I want to have more control over the process and also want to remove the "V1","V2", etc, that appear as column names when using transpose (t).

How can I achieve this using tidyverse?

Sample Code:

Code1 <- c("H200","H350","H250","T400","T240","T600")
Code2 <- c("4A","4A","4A","2B","2B","2B")
Level <- c(1,2,3,1,2,3)
Q1 <- c(30,40,40,50,60,80)
Q2 <- c(50,30,50,40,80,30)
Q3 <- c(30,45,70,42,81,34)

Df <- data.frame(Code1, Code2, Level, Q1, Q2, Q3)
Mike
  • 2,017
  • 6
  • 26
  • 53
  • This is a bad idea, as it's screwing up all your variable types. The tidyverse is tidy because it's built [to keep data tidy](http://vita.had.co.nz/papers/tidy-data.pdf), i.e. with variables in columns and observations in rows. Breaking that idiom is entirely possible, but most of the time it's just going to make your life miserable, say by turning everything into a matrix of characters (of what is mostly numbers) like you've done here. – alistaire Nov 05 '17 at 21:25
  • The reason I broke tidy principles is because the final table is for display purposes. I'm trying to recreate an existing Excel document. I'm planning to use knitr::kable on the final output, so that's the context for my question. – Mike Nov 06 '17 at 21:45

2 Answers2

16

The general idiom in the tidyverse is to gather() your data to the maximal extent, forming a "long" data frame with one measurement per row. Then, spread() can revert this long data frame into whichever "wide" format that you like best. This procedure can effectively transpose the data: just gather() all the identifier columns except the row names, and then spread() the row names.

For example, here is how to effectively transpose mtcars:

require(tidyverse)

mtcars %>% 
    rownames_to_column %>%
    gather(variable, value, -rowname) %>% 
    spread(rowname, value)

Your data does not have "row names" as understood in R, but Code1 effectively serves as a row name because it uniquely identifies each (original) row of your data.

Df1 <- Df %>% 
    group_by(Code1, Code2, Level) %>%
    summarise_all(funs(count = sum(!is.na(.)))) %>%
    gather(column, value, -Code1) %>%
    spread(Code1, value)

UPDATE for tidyr 1.0 or higher (late 2019 onwards)

The new pivot_wider() and pivot_longer() functions are now preferred over the older (but still supported) gather() and spread(). Thus the preferred way to transpose mtcars is probably

require(tidyverse)

mtcars %>% 
    rownames_to_column() %>%
    pivot_longer(-rowname, 'variable', 'value') %>%
    pivot_wider(variable, rowname)
Curt F.
  • 4,690
  • 2
  • 22
  • 39
  • 2
    If your data.frame or tibble has different data types, you may have to explicilty coerce them to a consistent data type first, e.g. using apply(mtcars, 2, as.character). Otherwise, pivot_longer may result in an error. (Works fine with mtcars, but might not in general). – Earlien Jul 19 '21 at 01:19
1
library(tidyr)
library(dplyr)

Df <- Df %>% group_by(Code1, Code2, Level) %>%
    summarise_all(funs(count = sum(!is.na(.)))) %>%
    gather(var, val, 2:ncol(Df)) %>%
    spread(Code1, val)
alistaire
  • 42,459
  • 4
  • 77
  • 117
johnzarifis
  • 362
  • 1
  • 4
  • 2
    While this code may answer the question, providing additional context regarding how and/or why it solves the problem would improve the answer's long-term value. – mx0 Nov 05 '17 at 21:20