7

I would like to convert a matrix/array (with dimnames) into a data frame. This can be done very easily using reshape2::melt but seems harder with tidyr, and in fact not really possible in the case of an array. Am I missing something? (In particular since reshape2 describes itself as being retired; see https://github.com/hadley/reshape).

For example, given the following matrix

MyScores <- matrix(runif(2*3), nrow = 2, ncol = 3, 
                   dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3]))

we can turn it into a data frame as follows

reshape2::melt(MyScores, value.name = 'Score') # perfect

or, using tidyr as follows:

as_tibble(MyScores, rownames = 'Month') %>% 
  gather(Class, Score, -Month)

In this case reshape2 and tidyr seem similar (although reshape2 is shorter if you are looking for a long-format data frame).

However for arrays, it seems harder. Given

EverybodyScores <- array(runif(2*3*5), dim = c(2,3,5), 
                         dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5))

we can turn it into a data frame as follows:

reshape2::melt(EverybodyScores, value.name = 'Score') # perfect

but using tidyr it's not clear how to do it:

as_tibble(EverybodyScores, rownames = 'Month') # looses month information and need to distange Class and StudentID

Is this a situation where the right solution is to stick to using reshape2?

banbh
  • 1,331
  • 1
  • 13
  • 31

3 Answers3

2

One way I just found by playing around is to coerce via tbl_cube. I have never really used the class but it seems to do the trick in this instance.

EverybodyScores <- array(
  runif(2 * 3 * 5),
  dim = c(2, 3, 5),
  dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5)
)
library(tidyverse)
library(cubelyr)
EverybodyScores %>%
  as.tbl_cube(met_name = "Score") %>%
  as_tibble
#> # A tibble: 30 x 4
#>    Month    Class StudentID Score
#>    <chr>    <chr>     <int> <dbl>
#>  1 January  A             1 0.366
#>  2 February A             1 0.254
#>  3 January  B             1 0.441
#>  4 February B             1 0.562
#>  5 January  C             1 0.313
#>  6 February C             1 0.192
#>  7 January  A             2 0.799
#>  8 February A             2 0.277
#>  9 January  B             2 0.631
#> 10 February B             2 0.101
#> # ... with 20 more rows

Created on 2018-08-15 by the reprex package (v0.2.0).

Calum You
  • 14,687
  • 4
  • 23
  • 42
  • This is the first time I've seen a `tbl_cube` in the wild! It's noteworthy (although I suppose natural in retrospect) that it plays so nicely with `as_tibble()`. – banbh Aug 16 '18 at 13:29
  • Now that you've shown me the light regarding `tbl_cube` I noticed a great answer that discusses it: [Maybe dplyr::tbl_cube ?](https://stackoverflow.com/a/21214749/239838). Coincidentally, the question that prompted that answer is close to my situation. – banbh Aug 16 '18 at 13:43
  • `as.tbl_cube` has been depcrecated from `tidyverse`, so this code no longer works. – Indrajeet Patil Feb 23 '21 at 20:22
  • 2
    I believe that `as.tbl_cube` has been pulled out of `dplyr` and moved into `cubelyr` (see https://rdrr.io/cran/cubelyr/man/as.tbl_cube.html). – banbh Feb 24 '21 at 13:36
2

Making a tibble drops the row names, but instead of going straight into a tibble, you can make the array into a base R data.frame, then use tidyr::rownames_to_column to make a column for months. Notice that converting to a data frame creates columns with names like A.1, sticking the class and ID together; you can separate these again with tidyr::separate. Calling as_tibble is optional, just for if you care about it being a tibble in the end, and also can come at any point in the workflow once you've made a column from the row names.

library(tidyverse)

EverybodyScores <- array(runif(2*3*5), dim = c(2,3,5), 
                         dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5))

EverybodyScores %>%
  as.data.frame() %>%
  rownames_to_column("Month") %>%
  gather(key = class_id, value = value, -Month) %>%
  separate(class_id, into = c("Class", "StudentID"), sep = "\\.") %>%
  as_tibble()
#> # A tibble: 30 x 4
#>    Month    Class StudentID value
#>    <chr>    <chr> <chr>     <dbl>
#>  1 January  A     1         0.576
#>  2 February A     1         0.229
#>  3 January  B     1         0.930
#>  4 February B     1         0.547
#>  5 January  C     1         0.761
#>  6 February C     1         0.468
#>  7 January  A     2         0.631
#>  8 February A     2         0.893
#>  9 January  B     2         0.638
#> 10 February B     2         0.735
#> # ... with 20 more rows

Created on 2018-08-15 by the reprex package (v0.2.0).

camille
  • 16,432
  • 18
  • 38
  • 60
  • It's interesting that `base::as.data.frame()` does what `as_tibble()` (aka `as_data_frame()`) fails to do -- namely keep rownames for 3D arrays. – banbh Aug 16 '18 at 13:16
  • Yes, by default `tibble`s drop row names. For base data frames and matrices, `as_tibble` has a `rownames` argument, but I don't know of a more direct way to make a `tibble` while keeping row names from an array. I think in this situation, `tbl_cube`, as done in the other answer, is a better fit. – camille Aug 16 '18 at 13:55
2

Here is the new tidyr way to do the same:

library(tidyr)

EverybodyScores <- array(
  runif(2 * 3 * 5),
  dim = c(2, 3, 5),
  dimnames = list(Month = month.name[1:2], Class = LETTERS[1:3], StudentID = 1:5)
)

as_tibble(EverybodyScores, rownames = "Month") %>%
  pivot_longer(
    cols = matches("^A|^B|^C"),
    names_sep = "\\.",
    names_to = c("Class", "StudentID")
  )
#> # A tibble: 30 x 4
#>    Month   Class StudentID  value
#>    <chr>   <chr> <chr>      <dbl>
#>  1 January A     1         0.0325
#>  2 January B     1         0.959 
#>  3 January C     1         0.593 
#>  4 January A     2         0.0702
#>  5 January B     2         0.882 
#>  6 January C     2         0.918 
#>  7 January A     3         0.459 
#>  8 January B     3         0.849 
#>  9 January C     3         0.901 
#> 10 January A     4         0.328 
#> # … with 20 more rows

Created on 2021-02-23 by the reprex package (v1.0.0)

Indrajeet Patil
  • 4,673
  • 2
  • 20
  • 51
  • 1
    Nice! One slight variation on this answer is to replace the `cols` parameter with `cols = -Month` which makes it slightly more general. However, an advantage of the accepted solution (which uses `cubelyr`) is that it works unchanged for a rank 4 (or higher) array. – banbh Feb 24 '21 at 04:26