Using melt / cast with variables of uneven length in R

Question

I'm working with a large data frame that I want to pivot, so that variables in a column become rows across the top.

I've found the reshape package very useful in such cases, except that the cast function defaults to fun.aggregate=length. Presumably this is because I'm performing these operations by "case" and the number of variables measured varies among cases.

I would like to pivot so that missing variables are denoted as "NA"s in the pivoted data frame.

So, in other words, I want to go from a molten data frame like this:

Case | Variable | Value
 1         1        2.3
 1         2        2.1
 1         3        1.3
 2         1        4.3
 2         2        2.5
 3         1        1.8
 3         2        1.9
 3         3        2.3
 3         4        2.2

To something like this:

Case | Variable 1 | Variable 2 | Variable 3 | Variable 4
 1         2.3          2.1          1.3         NA
 2         4.3          2.5          NA          NA
 3         1.8          1.9          2.3         2.2

The code dcast(data,...~Variable) again defaults to fun.aggregate=length, which does not preserve the original values.

Thanks for your help, and let me know if anything is unclear!

Maybe you should try `dcast` in `reshape2`? When I run your `dcast` statement using reshape2, I get your desired output (i.e. with the NA values). — joran, Jun 17 '11 at 20:48
Hmm, it appears I made my example too simple, because it does indeed work with that statement. It won't for the large dataset with which I'm working, though. Thanks for your comments! — Jon, Jun 17 '11 at 21:45

score 5 · Answer 1 · 2011-06-17T22:40:45.530

It is just a matter of including all of the variables in the cast call. Reshape expects the Value column to be called value, so it throws a warning, but still works fine. The reason that it was using fun.aggregate=length is because of the missing Case in the formula. It was aggregating over the values in Case.

Try: cast(data, Case~Variable)

data <- data.frame(Case=c(1,1,1,2,2,3,3,3,3),
  Variable=c(1,2,3,1,2,1,2,3,4),
  Value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))

cast(data,Case~Variable)
Using Value as value column.  Use the value argument to cast to override this choice
  Case   1   2   3   4
1    1 2.3 2.1 1.3  NA
2    2 4.3 2.5  NA  NA
3    3 1.8 1.9 2.3 2.2

Edit: as a response to the comment from @Jon. What do you do if there is one more variable in the data frame?

data <- data.frame(expt=c(1,1,1,1,2,2,2,2,2),
               func=c(1,1,1,2,2,3,3,3,3),
               variable=c(1,2,3,1,2,1,2,3,4),
               value=c(2.3,2.1,1.3,4.3,2.5,1.8,1.9,2.3,2.2))

cast(data,expt+variable~func)
  expt variable   1   2   3
1    1        1 2.3 4.3  NA
2    1        2 2.1  NA  NA
3    1        3 1.3  NA  NA
4    2        1  NA  NA 1.8
5    2        2  NA 2.5 1.9
6    2        3  NA  NA 2.3
7    2        4  NA  NA 2.2

I'm still trying to understand this fun.aggregate statement. I have a molten data frame with the column headers "Expt", "function.","variable"," and "value". I want to pivot the variables under "function." across the top as a function of "Expt" and "variable". So my function is dcast(data,Expt+variable~function.). I still get the "Aggregation function is missing: defaulting to length" error, though...any thoughts? — Jon, Jun 17 '11 at 21:47
@Jon, If those four columns are all that you have in your melted `data` data frame, the `cast(data, Expt + variable ~ function)` should work. If you edit your question with the new information, I will try to update my answer. — , Jun 17 '11 at 22:29

score 0 · Answer 2 · answered Nov 27 '13 at 11:56

To avoid the warning message, you could subset the data frame according to another variable, i.e a categorical variable having three levels a,b,c. Because in you current data for category a it has 70 cases, for b 80 cases, c has 90. Then the cast function doesn't know how to aggregate them.

Hope this helps.

score 0 · Answer 3 · answered Jun 17 '11 at 20:59

Here is one solution. It does not use the package or function you mention, but it could be of use. Suppose your data frame is called df:

M <- matrix(NA,
            nrow = length(unique(df$Case)),
            ncol = length(unique(df$Variable))+1,
            dimnames = list(NULL,c('Case',paste('Variable',sort(unique(df$Variable))))))
irow <- match(df$Case,unique(df$Case))
icol <- match(df$Variable,unique(df$Variable)) + 1
ientry <- irow + (icol-1)*nrow(M)
M[ientry] <- df$Value
M[,1] <- unique(df$Case)

Using melt / cast with variables of uneven length in R

3 Answers3

Linked