3

I am trying to reshape a dataframe:

Currently it looks like this:

ID   | Gender |A1 | A2 | A3 | B1 | B2 | B3
ID_1 | m      | 3 | 3  | 3  | 2  | 3  | 2 
ID_2 | f      | 1 | 1  | 1  | 4  | 4  | 4

I want to have something like:

 ID   | Gender | A1 | A2 | A3
 ID_1 | m      | 3  | 3  |  3   <- this would be columns A1 - A3 for ID 1
 ID_1 | m      | 2  | 2  |  2   <- this would be columns B1 - B3 for ID 1
 ID_2 | f      | 1  | 1  |  1   <- this would be columns A1 - A3 for ID 2
 ID_2 | f      | 4  | 4  |  4   <- this would be columns B1 - B3 for ID 2

(A1 and B1 / A2 and B2 are the same variables (with regard to the content), so for example: A1 and B1 would be both variables for the result of Test 1 and A2 and B2 both contain the result of Test 2. So in order to evaluate it I need all the result of Test1 in one column and all of Test 2 in another column. I tried to solve this with "melt", but it only melts down the dataframe one by one, not as chunks. (since I need to keep the first 2 columns the way they are and only rearrange the last 4 columns, but as chunks of three) Any other ideas? Thanks!

Elisa
  • 215
  • 1
  • 3
  • 11
  • Tell `melt` that the first two columns are id variables, i.e. `id.vars=...` – Andrie Nov 04 '11 at 15:38
  • The first part works, it is no problem to keep the first columns, however I don´t know how to tell R that it should rearrange all the Test1´s (A1, B1) / Test2´s (A2, B2) etc. in one columns. So that I end up with three instead of six columns (example two)... – Elisa Nov 04 '11 at 15:45
  • @Elisa, this looks like a duplicate of a question you created a few hours back. If it is, please close the other one down. – Ramnath Nov 04 '11 at 16:15
  • Yeah, some of worked hard on answers for that one... (BTW, this one's much better, you'll get very simple answers in only a couple of lines of code)... you also might want to add names to the columns in your question. – John Nov 04 '11 at 16:24
  • Thanks for all the ideas. I´ll try them and report back to you if it worked. The original dataframe has a lot more columns and very confusing column-names. So I might have to rename them before I reshape the data – Elisa Nov 04 '11 at 17:09

4 Answers4

5

One liner using reshape from base R.

reshape(dat, varying = 3:8, idvar = 1:2, direction = 'long', drop=FALSE, 
   timevar = 'Test')

           ID Gender Test Test1 Test2 Test3
ID_1.m.A ID_1      m    A    A1    A2    A3
ID_2.f.A ID_2      f    A    A1    A2    A3
ID_1.m.B ID_1      m    B    B1    B2    B3
ID_2.f.B ID_2      f    B    B1    B2    B3
Ramnath
  • 54,439
  • 16
  • 125
  • 152
  • +1 -- Hard to beat this solution. Anybody know how (within the one-liner) to drop those ugly `row.names`? – Josh O'Brien Nov 04 '11 at 16:15
  • you can drop the `row.name` by adding an argument `new.row.names = 1:4`. if you don't want to hardcode 4, you can easily figure out how to compute it based on the data. – Ramnath Nov 04 '11 at 16:26
  • That calculation's what I couldn't/can't figure out. Oh well -- even as a two-liner, this is pretty nice. – Josh O'Brien Nov 04 '11 at 16:31
  • I haven't used reshape in awhile but I'm wondering how this doesn't just reshape it with a single column y-variable? I can't seem to get it to actually work at all (fails at guessing varying). – John Nov 04 '11 at 16:35
  • If you _really_ want a one liner that also drops the row names: `\`row.names<-\`(reshape(dat, varying=3:8, idvar=1:2, direction='long', drop=FALSE, timevar='Test'), NULL)`. File this under, "you can do it, but you shouldn't" – Brian Diggs Nov 04 '11 at 16:42
  • @John. the trick to get `reshape` to work is to understand how it generates the `splits`. if you can post the data you are working with, i might be able to help with how to set up `reshape` – Ramnath Nov 04 '11 at 17:37
  • I got the problem, you have to have special names for the columns of data. That would be a good thing to put in the answer. – John Nov 04 '11 at 18:19
  • @John, I presume he is using `dat` as I defined it in my answer (which does have the "special" column names). – Brian Diggs Nov 04 '11 at 23:14
  • Yeah, but that doesn't help Elisa who has data of her own unlikely to have been named that way (and which also is much bigger than this toy example). It also doesn't help a subsequent reader of the question who gets stuck at the point of trying this one liner on data they might have similar to the question but without the names. – John Nov 05 '11 at 07:45
  • John is right, I am still struggeling with the column names. The problem is: I hava about 240 columns (like in the example above they are repeated every 10 columns, 24 times) and I need to keep the column names of the first set of 10 columns so that I can still interpret the values.... – Elisa Nov 05 '11 at 11:55
  • @Elisa. If you can paste the names of your first 20 columns, I can tweak my answer to suit your case. Without that, it is not possible to guess. – Ramnath Nov 05 '11 at 12:01
  • ID | BSN [there is 50 columns of fixdata that just stays the way it is in between here] here starts the part that should be reshaped: Fach1 | Fach1_P_Fach | Fach1_Punkte_1_KHJ | Fach1_Punkte_1_KHJ_eingebracht | Fach1_Punkte_2_KHJ | Fach1_Punkte_2_KHJ_eingebracht Fach1_Punkte_3_KHJ | Fach1_Punkte_3_KHJ_eingebracht | Fach1_Punkte_4_KHJ | Fach1_Punkte_4_KHJ_eingebracht .. it´s german. But the most important part is: "Fach1_...", which will change to "Fach2.." in the second set and so on. – Elisa Nov 05 '11 at 12:10
  • For a second line that drops row names (which is the same as Brian's suggestion, but IMHO more readable) do: `rownames (result) <- NULL` – cbeleites unhappy with SX Nov 05 '11 at 18:49
  • 1
    Elisa, if all columns that should be reshaped start with Fach, then `grep` could help you to get the correct columns for the `varying` argument: `varying = grep ("^Fach", colnames (origdata))`. And `idvar` are just the other columns (something like `1 : (min (grep ("^Fach", colnames (origdata)) - 1)` and you could try `timevar = "Fach"` [I don't have R here to test the expressions, though - but at least that should give you a hint where to look for the solution] – cbeleites unhappy with SX Nov 05 '11 at 19:00
2

As @Andrie said, the first step is melting the data with your given columns (ID and gender). Your problem, as you say, is identifying what columns then "go together". Here is one approach, originally encoding that information in column names, and then pulling it out from there.

First some dummy data

dat <- data.frame(ID=c("ID_1", "ID_2"), Gender=c("m","f"), 
  Test1.A = "A1", Test2.A = "A2", Test3.A = "A3",
  Test1.B = "B1", Test2.B = "B2", Test3.B = "B3", stringsAsFactors=FALSE)

Note that I've named the columns with a name that systematically indicates which test and which group it is part of.

> dat
    ID Gender Test1.A Test2.A Test3.A Test1.B Test2.B Test3.B
1 ID_1      m      A1      A2      A3      B1      B2      B3
2 ID_2      f      A1      A2      A3      B1      B2      B3

Using the reshape2 package

library("reshape2")

Melt the data, and then take the variable column which has two pieces of information in it (test and group), and split those two bits of info into two separate columns.

dat.m <- melt(dat, id.vars=c("ID", "Gender"))
dat.m <- cbind(dat.m, colsplit(dat.m$variable, "\\.", names=c("Test", "Group")))

Now it is easy to cast since the test and the group are separate.

dcast(dat.m, ID+Gender+Group~Test)

Which gives

> dcast(dat.m, ID+Gender+Group~Test)
    ID Gender Group Test1 Test2 Test3
1 ID_1      m     A    A1    A2    A3
2 ID_1      m     B    B1    B2    B3
3 ID_2      f     A    A1    A2    A3
4 ID_2      f     B    B1    B2    B3
Brian Diggs
  • 57,757
  • 13
  • 166
  • 188
1

I like Brian's answer better but here's a way to do it with the base package. Pretty ugly though in my opinion.

Your dataframe:

DF
        id    sex v1 v2 v3 v4 v5 v6
    1 ID_1   male A1 A2 A3 B1 B2 B3
    2 ID_2 female A1 A2 A3 B1 B2 B3

Code

DFa<-subset(DF, select=c(1:5))
DFb<-subset(DF, select=c(1:2, 6:8))
colnames(DFb)<-colnames(DFa)
DF<-as.data.frame(rbind(DFa,DFb))
rownames(DF)<-1:nrow(DF)
DF[order(DF$id),] 
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
  • John I disagree. I see the output is identical to the OP's expected outcome. There are only 4 rows not 6 so I don't understand your critcism. – Tyler Rinker Nov 04 '11 at 22:13
  • The only differences between my response and others is I don't give a separate column for A and B as it is inherent in the response (this may be a problem though if the A1,A2,B1... are actually numeric) and I reordered the rows to match the desired outcome. It's a simple case of splitting columns and restacking. – Tyler Rinker Nov 04 '11 at 22:20
  • When I first copied and pasted your code it made 6 rows for some reason. Now it doesn't. sorry. – John Nov 05 '11 at 07:50
0

How about:

> dat <- data.frame(id=c("id1","id2"),gender=c("m","f"),a.1=1:2,a.2=1:2,a.3=1:2,b.1=3:4,b.2=3:4,b.3=3:4)
> dat1 <- dat[,-(3:5)]
> dat2 <- dat[,-(6:8)]
> names(dat1)[3:5] <- c("v1","v2","v3")
> names(dat2)[3:5] <- c("v1","v2","v3")
> 
> dat1$test <- "b"
> dat2$test <- "a"
> result <- rbind(dat1,dat2)
> dat
   id gender a.1 a.2 a.3 b.1 b.2 b.3
1 id1      m   1   1   1   3   3   3
2 id2      f   2   2   2   4   4   4
> result
   id gender v1 v2 v3 test
1 id1      m  3  3  3    b
2 id2      f  4  4  4    b
3 id1      m  1  1  1    a
4 id2      f  2  2  2    a
Ian Fellows
  • 17,228
  • 10
  • 49
  • 63