stacking columns into 1 column in R

Question

I have a data frame that looks like:

ID Time U1 U2 U3 U4 ...
1  20    1  2 3  5 .. 
2  20    2  5 9  4 ..
3  20    2  5 6  4 ..
.
.

And I would need to keep it like: 

ID Time  U
1  20    1
1  20    2
1  20    3
1  20    5
2  20    2
2  20    5
2  20    9
2  20    4
3  20    2
3  20    5
3  20    6
3  20    4

I tried with:

X <- read.table("mydata.txt", header=TRUE, sep=",")
X_D <- as.data.frame(X)
X_new <- stack(X_D, select = -c(ID, Time))

But I haven't managed to get the data into that form. Honestly, I have little experience with stacking/transposing, so any help is greatly appreciated!

This is commonly called conversion from wide format to long format. However, the way you've defined it, you lose information about which column a piece of data came from. Incidentally, the package `reshape2` covers this sort of transformation. — ndoogan, Apr 09 '13 at 19:54
Additionally, you're more likely to get the answer you really want if you provide a really easy way for answerers to get example data into R to work with. Providing what's in the file isn't simple, but providing `dput()` output of the original example dataframe object IS! — ndoogan, Apr 09 '13 at 20:00
@Phil - can you enumerate it? I'm looking for the tidyr equivalent of the JMP "stack columns" platform. — EngrStudent, Oct 14 '17 at 17:23

score 16 · Answer 1 · answered Apr 10 '13 at 05:41

Here's the stack approach:

dat2a <- data.frame(dat[1:2], stack(dat[3:ncol(dat)]))
dat2a
#    ID Time values ind
# 1   1   20      1  U1
# 2   2   20      2  U1
# 3   3   20      2  U1
# 4   1   20      2  U2
# 5   2   20      5  U2
# 6   3   20      5  U2
# 7   1   20      3  U3
# 8   2   20      9  U3
# 9   3   20      6  U3
# 10  1   20      5  U4
# 11  2   20      4  U4
# 12  3   20      4  U4

This is very similar to melt from "reshape2":

library(reshape2)
dat2b <- melt(dat, id.vars=1:2)
dat2b
#    ID Time variable value
# 1   1   20       U1     1
# 2   2   20       U1     2
# 3   3   20       U1     2
# 4   1   20       U2     2
# 5   2   20       U2     5
# 6   3   20       U2     5
# 7   1   20       U3     3
# 8   2   20       U3     9
# 9   3   20       U3     6
# 10  1   20       U4     5
# 11  2   20       U4     4
# 12  3   20       U4     4

And, very similar to @TylerRinker's answer, but not dropping the "times", is to just use sep = "" to help R guess time and variable names.

dat3 <- reshape(dat, direction = "long", idvar=1:2, 
                varying=3:ncol(dat), sep = "", timevar="Measure")
dat3
#        ID Time Measure U
# 1.20.1  1   20       1 1
# 2.20.1  2   20       1 2
# 3.20.1  3   20       1 2
# 1.20.2  1   20       2 2
# 2.20.2  2   20       2 5
# 3.20.2  3   20       2 5
# 1.20.3  1   20       3 3
# 2.20.3  2   20       3 9
# 3.20.3  3   20       3 6
# 1.20.4  1   20       4 5
# 2.20.4  2   20       4 4
# 3.20.4  3   20       4 4

In all three of those, you end up with four columns, not three, like you describe in your desired output. However, as @ndoogan points out, by doing so, you're loosing information about your data. If you're fine with that, you can always drop that column from the resulting data.frame quite easily (for example, dat2a <- dat2a[-4].

Tyler Rinker · Answer 2 · 2013-04-09T20:21:17.523

2

With base reshape:

dat <- read.table(text="ID Time U1 U2 U3 U4
1  20    1  2 3  5
2  20    2  5 9  4
3  20    2  5 6  4", header=TRUE)


colnames(dat) <- gsub("([a-zA-Z]*)([0-9])", "\\1.\\2", colnames(dat))
reshape(dat, varying=3:ncol(dat), v.names="U", direction ="long", timevar = "Time", 
    idvar = "ID")

edited Apr 09 '13 at 20:21

answered Apr 09 '13 at 20:08

Tyler Rinker

108,132
65
322
519

(+1) It could be worth specifying what "base reshape" means in the context of another reshape answer that depends on reshape2 package. – ndoogan Apr 09 '13 at 20:10
This should do it for anyone wondering: `?reshape` – Tyler Rinker Apr 09 '13 at 20:15
+1 for matching the output exactly, but I do also worry about loosing information this way. – A5C1D2H2I1M1N2O1R2T1 Apr 10 '13 at 05:41
@AnandaMahto Agreed. In this case I didn't see it as a problem so I dropped it to match the user's wants. – Tyler Rinker Apr 10 '13 at 06:01

score 1 · Answer 3 · answered Jun 18 '17 at 20:14

1

You can also use melt():

library(reshape2)

new_data <- melt(old_data, id.vars=c("ID","Time"),
    value.name = "U")

Then remove the 'variable' column:

new_data$variable <- NULL

answered Jun 18 '17 at 20:14

Jala015

73
2
9

Oops, this will actually order the U rows by rows in the old_data – Jala015 Jun 18 '17 at 20:18

score 0 · Answer 4 · answered Apr 09 '13 at 19:49

0

Try this:

do.call(rbind, lapply(1:4, function(i)structure(dat[,c("ID", "Time", paste0("U",i))], names=c("ID", "Time", "U"))))

Where dat is your data.frame...

answered Apr 09 '13 at 19:49

Ferdinand.kraft

12,579
10
47
69

Thanks! It gives me an error though: 'undefined columns selected' here '[.data.frame'(dat, ,[,c("ID", "Time", – user2263330 Apr 09 '13 at 20:03
Hi @user2263330, it works for me with `dat <- data.frame(ID=1:3, Time=20, U1=1:3, U2=4:6, U3=7:9, U4=10:12)`. What are the names of your data.frame? – Ferdinand.kraft Apr 09 '13 at 20:08

stacking columns into 1 column in R

4 Answers4

Linked

Related