2

In the Wickham's Tidy Data pdf he has an example to go from messy to tidy data.

I wonder where the code is?

For example, what code is used to go from

Table 1: Typical presentation dataset.

to

Table 3: The same data as in Table 1 but with variables in columns and observations in rows.

Per haps melt or cast. But from http://www.statmethods.net/management/reshape.html I cant see how.

(Note to self: Need it for GDPpercapita...)

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • 2
    Looks to me like "Table 1" is a matrix, so you can just use `libray(reshape2); melt(table1)` (if your dataset is called "table1). – A5C1D2H2I1M1N2O1R2T1 Jul 08 '15 at 16:01
  • @Molx, that's not the most intuitive place to look (or the most intuitive search expression), since these are different packages (though one comprises a lot of wrappers for the "reshape2" approaches). The "tidyr" vignette would only focus on `data.frame`s, while the "reshape2" package also handled other data types. – A5C1D2H2I1M1N2O1R2T1 Jul 08 '15 at 16:12
  • @AnandaMahto You're right, I actually thought the paper was about tidyr given its title, didn't notice it was about reshape2. – Molx Jul 08 '15 at 16:16
  • The paper predates tidyr by a few years. I would still encourage the OP to look at [the tidyr vingette](http://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html), it covers many of the same principles, showing accompanying `tidyr` code. – Gregor Thomas Jul 08 '15 at 16:28
  • @Gregor, but it's still important to recognize that "tidyr" does less than "reshape2" and is more limited in the types of data it takes as inputs. – A5C1D2H2I1M1N2O1R2T1 Jul 08 '15 at 16:41

1 Answers1

2

The answer sort of depends on what the structure of your data are. In the paper you linked to, Hadley was writing about the "reshape" and "reshape2" packages.

It's ambiguous what the data structure is in "Table 1". Judging by the description, it would sound like a matrix with named dimnames (like I show in mymat). In that case, a simple melt would work:

library(reshape2)
melt(mymat)
#           Var1       Var2 value
# 1   John Smith treatmenta     —
# 2     Jane Doe treatmenta    16
# 3 Mary Johnson treatmenta     3
# 4   John Smith treatmentb     2
# 5     Jane Doe treatmentb    11
# 6 Mary Johnson treatmentb     1

If it were not a matrix, but a data.frame with row.names, you can still use the matrix method by using something like melt(as.matrix(mymat)).

If, on the other hand, the "names" are a column in a data.frame (as they are in the "tidyr" vignette, you need to specify either the id.vars or the measure.vars so that melt knows how to treat the columns.

melt(mydf, id.vars = "name")
#           name   variable value
# 1   John Smith treatmenta     —
# 2     Jane Doe treatmenta    16
# 3 Mary Johnson treatmenta     3
# 4   John Smith treatmentb     2
# 5     Jane Doe treatmentb    11
# 6 Mary Johnson treatmentb     1

The new kid on the block is "tidyr". The "tidyr" package works with data.frames because it is often used in conjunction with dplyr. I won't reproduce the code for "tidyr" here, because that is sufficiently covered in the vignette.


Sample data:

mymat <- structure(c("—", "16", "3", " 2", "11", " 1"), .Dim = c(3L, 
    2L), .Dimnames = list(c("John Smith", "Jane Doe", "Mary Johnson"
    ), c("treatmenta", "treatmentb")))

mydf <- structure(list(name = structure(c(2L, 1L, 3L), .Label = c("Jane Doe", 
    "John Smith", "Mary Johnson"), class = "factor"), treatmenta = c("—", 
    "16", "3"), treatmentb = c(2L, 11L, 1L)), .Names = c("name", 
    "treatmenta", "treatmentb"), row.names = c(NA, 3L), class = "data.frame")
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485