1

Simple question that I couldn't solve myself (or find a solution):

df <- data.frame(A1 = sample(1:100, 10, replace = TRUE), A2 = sample(1:100, 10, replace = TRUE))
molten <- melt(df)

How can I reverse this and get the original df back? Neither dcast nor cast work for me.

Geo Vogler
  • 63
  • 1
  • 8

2 Answers2

1

It works if you first construct and ID variable

df$id <- 1:nrow(df)
molten <- melt(df, id.var="id")
dcast(molten, id~variable)

If you are already in the molten state, notice from your example, that the rows are sorted properly. You can take advantage of this to construct an ID with rep:

molten$id <- rep(1:(nrow(molten) / 2), 2)

Then the dcast method above will work. Of course, there may be more than two variables that were melted. You can generalize the rep as follows:

molten$id <- rep(1:(nrow(molten) / length(unique(variable))), 
                 length(unique(variable)))

Note that this ID creation relies on two fairly large assumptions:

  1. The data creator did not sort the data in a manner that would ruin the original order.
  2. Any missing values in the melt were not dropped.

You can partially test for the second problem using table. Visually check that all levels of "variable" have the same length. This is not fool proof, but is a pretty good indicator.

lmo
  • 37,904
  • 9
  • 56
  • 69
  • 1
    Thanks @user20650. I added this scenario to the bottom, though I think I'll generalize it. – lmo May 25 '16 at 23:28
  • Good stuff, however, I think this might run into trouble if there were uneven numbers in each group (try it after `molten <- molten[-1, ]`) . My feeling would be to go with `ave`, and `seq_along`. – user20650 May 25 '16 at 23:42
  • I agree. However, at that point it would difficult if not impossible to match up the observations for the "wide" operation. If the user who created the melted data dropped all NAs before we were able to see it, this operation would be dangerous. Maybe I should mention that. – lmo May 25 '16 at 23:50
  • The dataframe is really only a dataframe with 2 variables (one factor with 2 levels, and numerical observations) that I want to get into the wide form. Since the number of factors varies in my data (uneven!) I'm going with `ave` and `seq_long` – Geo Vogler May 25 '16 at 23:56
1

This could be easily done with unstack

unstack(molten, value~variable)
#   A1 A2
#1  49 46
#2  51 75
#3  41 27
#4  75  4
#5  91 79
#6  19 87
#7  24 18
#8  96 57
#9  87 42
#10  8 47

Or another option is spread from tidyr after creating a sequence column

library(dplyr)
library(tidyr)
molten %>% 
    group_by(variable) %>%
    mutate(n = row_number()) %>% 
    spread(variable, value) %>% 
    select(-n)    
#      A1    A2
#   <int> <int>
#1     49    46
#2     51    75
#3     41    27
#4     75     4
#5     91    79
#6     19    87
#7     24    18
#8     96    57
#9     87    42
#10     8    47
akrun
  • 874,273
  • 37
  • 540
  • 662
  • First one works only with variables of equal lengths, but the second one works fine! – Geo Vogler May 26 '16 at 17:36
  • @GeoVogler Yes, that is right. Your example was based on equal lengths, that is the reason I added the `unstack` – akrun May 26 '16 at 18:03