12

I have a data frame that looks like:

ID Time U1 U2 U3 U4 ...
1  20    1  2 3  5 .. 
2  20    2  5 9  4 ..
3  20    2  5 6  4 ..
.
.

And I would need to keep it like: 

ID Time  U
1  20    1
1  20    2
1  20    3
1  20    5
2  20    2
2  20    5
2  20    9
2  20    4
3  20    2
3  20    5
3  20    6
3  20    4

I tried with:

X <- read.table("mydata.txt", header=TRUE, sep=",")
X_D <- as.data.frame(X)
X_new <- stack(X_D, select = -c(ID, Time))

But I haven't managed to get the data into that form. Honestly, I have little experience with stacking/transposing, so any help is greatly appreciated!

Infinite Recursion
  • 6,511
  • 28
  • 39
  • 51
user2263330
  • 121
  • 1
  • 1
  • 3
  • This is commonly called conversion from wide format to long format. However, the way you've defined it, you lose information about which column a piece of data came from. Incidentally, the package `reshape2` covers this sort of transformation. – ndoogan Apr 09 '13 at 19:54
  • 1
    Additionally, you're more likely to get the answer you really want if you provide a really easy way for answerers to get example data into R to work with. Providing what's in the file isn't simple, but providing `dput()` output of the original example dataframe object IS! – ndoogan Apr 09 '13 at 20:00
  • 1
    There's also a solution with `tidyr::gather()` – Phil Oct 19 '15 at 13:27
  • @Phil - can you enumerate it? I'm looking for the tidyr equivalent of the JMP "stack columns" platform. – EngrStudent Oct 14 '17 at 17:23

4 Answers4

16

Here's the stack approach:

dat2a <- data.frame(dat[1:2], stack(dat[3:ncol(dat)]))
dat2a
#    ID Time values ind
# 1   1   20      1  U1
# 2   2   20      2  U1
# 3   3   20      2  U1
# 4   1   20      2  U2
# 5   2   20      5  U2
# 6   3   20      5  U2
# 7   1   20      3  U3
# 8   2   20      9  U3
# 9   3   20      6  U3
# 10  1   20      5  U4
# 11  2   20      4  U4
# 12  3   20      4  U4

This is very similar to melt from "reshape2":

library(reshape2)
dat2b <- melt(dat, id.vars=1:2)
dat2b
#    ID Time variable value
# 1   1   20       U1     1
# 2   2   20       U1     2
# 3   3   20       U1     2
# 4   1   20       U2     2
# 5   2   20       U2     5
# 6   3   20       U2     5
# 7   1   20       U3     3
# 8   2   20       U3     9
# 9   3   20       U3     6
# 10  1   20       U4     5
# 11  2   20       U4     4
# 12  3   20       U4     4

And, very similar to @TylerRinker's answer, but not dropping the "times", is to just use sep = "" to help R guess time and variable names.

dat3 <- reshape(dat, direction = "long", idvar=1:2, 
                varying=3:ncol(dat), sep = "", timevar="Measure")
dat3
#        ID Time Measure U
# 1.20.1  1   20       1 1
# 2.20.1  2   20       1 2
# 3.20.1  3   20       1 2
# 1.20.2  1   20       2 2
# 2.20.2  2   20       2 5
# 3.20.2  3   20       2 5
# 1.20.3  1   20       3 3
# 2.20.3  2   20       3 9
# 3.20.3  3   20       3 6
# 1.20.4  1   20       4 5
# 2.20.4  2   20       4 4
# 3.20.4  3   20       4 4

In all three of those, you end up with four columns, not three, like you describe in your desired output. However, as @ndoogan points out, by doing so, you're loosing information about your data. If you're fine with that, you can always drop that column from the resulting data.frame quite easily (for example, dat2a <- dat2a[-4].

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
2

With base reshape:

dat <- read.table(text="ID Time U1 U2 U3 U4
1  20    1  2 3  5
2  20    2  5 9  4
3  20    2  5 6  4", header=TRUE)


colnames(dat) <- gsub("([a-zA-Z]*)([0-9])", "\\1.\\2", colnames(dat))
reshape(dat, varying=3:ncol(dat), v.names="U", direction ="long", timevar = "Time", 
    idvar = "ID")
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
1

You can also use melt():

library(reshape2)

new_data <- melt(old_data, id.vars=c("ID","Time"),
    value.name = "U")

Then remove the 'variable' column:

new_data$variable <- NULL
Jala015
  • 73
  • 2
  • 9
0

Try this:

do.call(rbind, lapply(1:4, function(i)structure(dat[,c("ID", "Time", paste0("U",i))], names=c("ID", "Time", "U"))))

Where dat is your data.frame...

Ferdinand.kraft
  • 12,579
  • 10
  • 47
  • 69
  • Thanks! It gives me an error though: 'undefined columns selected' here '[.data.frame'(dat, ,[,c("ID", "Time", – user2263330 Apr 09 '13 at 20:03
  • Hi @user2263330, it works for me with `dat <- data.frame(ID=1:3, Time=20, U1=1:3, U2=4:6, U3=7:9, U4=10:12)`. What are the names of your data.frame? – Ferdinand.kraft Apr 09 '13 at 20:08