2

I am facing a reshaping problem with a dataframe. It has many more rows and columns. Simplified, it structure looks like this:

rownames    year    x1  x2  x3
a           2000    2   6   11
b           2000    0   4   2
c           2000    0   3   5
a           2010    2   6   11
b           2010    0   0   0
c           2020    4   1   8
a           2020    10  1   7
b           2020    8   4   10
c           2020    22  1   16

I would like to come out with a dataframe that has one single row for the variable "year", copy the x1, x2, x3 values in subsequent columns, and rename the columns with a combination between the rowname and the x-variable. It should look like this:

year  a_x1  a_x2  a_x3  b_x1  b_x2  b_x3  c_x1  c_x2  c_x3
2000  2     6     11    0     4     2     0     3     5
2010  2     6     11    0     0     0     4     1     8
2020  10    1     7     8     4     10    22    1     16

I thought to use subsequent cbind() functions, but since I have to do it for thousands of rows and hundreds columns, I hope there is a more direct way with the reshape package (with which I am not so familiar yet)

Thanks in advance!

refroll
  • 131
  • 1
  • 9
  • please use `dput(sample_data)` so users can apply their suggestions to the same data you are looking at. – crogg01 Feb 12 '14 at 13:39
  • How do you want to deal with duplicated values? Or should I assume this was a mistake based on your sample output? – A5C1D2H2I1M1N2O1R2T1 Feb 12 '14 at 13:40
  • @HansRoggeman in this case it is 100% reproducible: `dt <- read.table( text="...", , header = TRUE )` and just replace the `...` with the copied text. – Beasterfield Feb 12 '14 at 13:45

2 Answers2

4

First, I hope that rownames is a data.frame column and not the data.frame's rownames. Otherwise you'll encounter problems due to the non-uniqueness of the values.

I think your main problem is, that your data.frame is not entirely molten:

library(reshape2)

dt <- melt( dt, id.vars=c("year", "rownames") )
head(dt)
   year rownames variable value
1  2000        a       x1     2
2  2000        b       x1     0
3  2000        c       x1     0
4  2010        a       x1     2
...

dcast( dt, year ~ rownames + variable )
  year a_x1 a_x2 a_x3 b_x1 b_x2 b_x3 c_x1 c_x2 c_x3
1 2000    2    6   11    0    4    2    0    3    5
2 2010    2    6   11    0    0    0    4    1    8
3 2020   10    1    7    8    4   10   22    1   16

EDIT:

As @spdickson points out, there is also an error in your data avoiding a simple aggregation. Combinations of year, rowname have to be unique of course. Otherwise you need an aggregation function which determines the resulting values of non-unique combinations. So we assume that row 6 in your data should read c 2010 4 1 8.

Beasterfield
  • 7,023
  • 2
  • 38
  • 47
  • Beat me by 15 seconds... +1 – A5C1D2H2I1M1N2O1R2T1 Feb 12 '14 at 13:44
  • An error in his example dataset made it so that `dcast()` aggregated your results using length. If you change the first instance of `2020` to `2010`, that should fix it so that you get the results he wanted. – Sam Dickson Feb 12 '14 at 13:51
  • Thanks, it works! I mean `rownames` as the data.frame's rownames. However, I extracted the rownames vector and inserted as a column. Combination of year-rownames, although I have hundreds of `years`, are luckily always the same in my case. – refroll Feb 12 '14 at 15:10
3

You can try using reshape() from base R without having to melt your dataframe further:

df1 <- read.table(text="rownames    year    x1  x2  x3
a           2000    2   6   11
b           2000    0   4   2
c           2000    0   3   5
a           2010    2   6   11
b           2010    0   0   0
c           2010    4   1   8
a           2020    10  1   7
b           2020    8   4   10
c           2020    22  1   16",header=T,as.is=T)

reshape(df1,direction="wide",idvar="year",timevar="rownames")
#   year x1.a x2.a x3.a x1.b x2.b x3.b x1.c x2.c x3.c
# 1 2000    2    6   11    0    4    2    0    3    5
# 4 2010    2    6   11    0    0    0    4    1    8
# 7 2020   10    1    7    8    4   10   22    1   16
Sam Dickson
  • 5,082
  • 1
  • 27
  • 45