you can use my r package onetree, which is uploaded to my github yikeshu0611.
install.packages("devtools") #if you didnot have devtools packages in r
library(devtools)
install_github("yikeshu0611/onetree") #install onetree package from github
1. step by step
First, I will teach you how to convert wide to long step by step.
library(onetree)
long1=reshape_toLong(data=df1,
id= "ID",
j="newcolumn",
value.var.prefix=c("M1a","M2a","M1r","M2r")
In this command, j is the name of new column.
you will get the result long1 below
long1
ID Group newcolumn M1a M2a M1r M2r
1 1 2hB 0.2 0.3 200 300
1 1 3hB 0.4 0.6 400 600
2 1 2hB 0.3 0.4 300 400
2 1 3hB 0.6 0.6 600 600
Further, we can see in data long1, M1a, M2a-------, M1r, M2r-----. the data is still a wide data. We can still convert it to long. We use M1, M2 as prefix. a and r as new column, which is test way. Command is below.
long2=reshape_toLong(data = long1,
id = c("ID","newcolumn"),
j = "testway",
value.var.prefix = c("M1","M2"))
long2
ID newcolumn Group testway M1 M2
1 1 2hB 1 a 0.2 0.3
2 1 2hB 1 r 200.0 300.0
3 1 3hB 1 a 0.4 0.6
4 1 3hB 1 r 400.0 600.0
5 2 2hB 1 a 0.3 0.4
6 2 2hB 1 r 300.0 400.0
7 2 3hB 1 a 0.6 0.6
8 2 3hB 1 r 600.0 600.0
Here, we use two variable ID and newcolumn as id object. Because in long data, id is treated as a unique variable, if we only use ID, missmatch will happen. Also you can create a new id, ex: idnew.
long1$idnew = 1:nrow(long1)
reshape_toLong(data = long1,
id = "idnew",
j = "testway",
value.var.prefix = c("M1","M2"))
Let's going on! In data long2, there may be M1, M2,-------. So long2 is still a wide data. Yeah, we can change is to long data. M as prefix, 1,2,3,-----as new column. But, id should be ID, newcolumn and testway or you can create a new id to long2, which will ensure id unique.
long3=reshape_toLong(data = long2,
id = c("ID","newcolumn","testway"),
j = "testnumber",
value.var.prefix = "M")
long3
ID newcolumn testway Group testnumber M
1 1 2hB a 1 1 0.2
2 1 2hB a 1 2 0.3
3 1 2hB r 1 1 200.0
4 1 2hB r 1 2 300.0
5 1 3hB a 1 1 0.4
6 1 3hB a 1 2 0.6
7 1 3hB r 1 1 400.0
8 1 3hB r 1 2 600.0
9 2 2hB a 1 1 0.3
10 2 2hB a 1 2 0.4
11 2 2hB r 1 1 300.0
12 2 2hB r 1 2 400.0
13 2 3hB a 1 1 0.6
14 2 3hB a 1 2 0.6
15 2 3hB r 1 1 600.0
16 2 3hB r 1 2 600.0
Now, data long3 is an absolutely long data.
prefix is very import, we use prefixes as below
- first: M1a, M2a, M1r, M2r
- second: M1, M2
- third: M
we change id three times, to make it unique
- first: ID
- second: ID, newcolumn
- thrid: ID, newcolumn, testway
j is new column
- first: newcolumn
- second: testway
- third: testnumber
2. A little bit faster
If each measure outcome has 4 outcomes: a2, a3,r2 r3. a: absolut, r: relative, 2: time 2, 3: time 3. Then 1100 columns has 275 measure outcomes(1100/4). So, we have M1a2hB, M2a2hB, M3a2hB------M275a2hB. and M1a3hB, M2a3hB, M3a3hB------M275a3hB, and M3 is like that. IF we use command like that, we will has a much long value.var.prefix.
However, we can use faster way to construct prefix by paste0 function.
ma2=paste0("M",1:275,"a")
ma3=paste0("M",1:275,"a")
mr2=paste0("M",1:275,"r")
mr3=paste0("M",1:275,"r")
m=c(ma2,ma3,mr2,mr3)
In df1, we only has 2 measure outcomes, so we can use command below
ma2=paste0("M",1:2,"a")
ma3=paste0("M",1:2,"a")
mr2=paste0("M",1:2,"r")
mr3=paste0("M",1:2,"r")
prefix=c(ma2,ma3,mr2,mr3)
reshape_toLong(data = df1,
id = "ID",
j = "newcolumn",
value.var.prefix = prefix)
ID Group newcolumn M1a M2a M1r M2r
1 1 1 2hB 0.2 0.3 200 300
2 1 1 3hB 0.4 0.6 400 600
3 2 1 2hB 0.3 0.4 300 400
4 2 1 3hB 0.6 0.6 600 600
Still, we can use M1, M2----- as prefix, we change a2hB, a3hB, r2hB, r3hB to new column. Then we substring the new column to different columns.
m1=paste0("M",1:2)
m2=paste0("M",1:2)
prefix=c(m1,m2)
long4=reshape_toLong(data = df1,
id = "ID",
j = "newcolumn",
value.var.prefix = prefix)
long4
ID Group newcolumn M1 M2
1 1 1 a2hB 0.2 0.3
2 1 1 a3hB 0.4 0.6
3 1 1 r2hB 200.0 300.0
4 1 1 r3hB 400.0 600.0
5 2 1 a2hB 0.3 0.4
6 2 1 a3hB 0.6 0.6
7 2 1 r2hB 300.0 400.0
8 2 1 r3hB 600.0 600.0
long4$testway=Left(long4$newcolumn,1)
long4$time=Right(long4$newcolumn,3)
long4
ID Group newcolumn M1 M2 testway time
1 1 1 a2hB 0.2 0.3 a 2hB
2 1 1 a3hB 0.4 0.6 a 3hB
3 1 1 r2hB 200.0 300.0 r 2hB
4 1 1 r3hB 400.0 600.0 r 3hB
5 2 1 a2hB 0.3 0.4 a 2hB
6 2 1 a3hB 0.6 0.6 a 3hB
7 2 1 r2hB 300.0 400.0 r 2hB
8 2 1 r3hB 600.0 600.0 r 3hB
Last, we can only use M as prefix, to get the absolutely data.
long5=reshape_toLong(data = df1,
id = "ID",
j = "newcolumn",
value.var.prefix = "M")
long5
ID Group newcolumn M
1 1 1 1a2hB 0.2
2 1 1 1a3hB 0.4
3 1 1 2a2hB 0.3
4 1 1 2a3hB 0.6
5 1 1 1r2hB 200.0
6 1 1 1r3hB 400.0
7 1 1 2r2hB 300.0
8 1 1 2r3hB 600.0
9 2 1 1a2hB 0.3
10 2 1 1a3hB 0.6
11 2 1 2a2hB 0.4
12 2 1 2a3hB 0.6
13 2 1 1r2hB 300.0
14 2 1 1r3hB 600.0
15 2 1 2r2hB 400.0
16 2 1 2r3hB 600.0
Then we can use Left, Mid and Right function in onetree package to substring from left, mid and right to get new columns.
long5$testnumber=Left(long5$newcolumn,1)
long5$testway=Mid(long5$newcolumn,2,1)
long5$time=Right(long5$newcolumn,3)
long5
ID Group newcolumn M testnumber testway time
1 1 1 1a2hB 0.2 1 a 2hB
2 1 1 1a3hB 0.4 1 a 3hB
3 1 1 2a2hB 0.3 2 a 2hB
4 1 1 2a3hB 0.6 2 a 3hB
5 1 1 1r2hB 200.0 1 r 2hB
6 1 1 1r3hB 400.0 1 r 3hB
7 1 1 2r2hB 300.0 2 r 2hB
8 1 1 2r3hB 600.0 2 r 3hB
9 2 1 1a2hB 0.3 1 a 2hB
10 2 1 1a3hB 0.6 1 a 3hB
11 2 1 2a2hB 0.4 2 a 2hB
12 2 1 2a3hB 0.6 2 a 3hB
13 2 1 1r2hB 300.0 1 r 2hB
14 2 1 1r3hB 600.0 1 r 3hB
15 2 1 2r2hB 400.0 2 r 2hB
16 2 1 2r3hB 600.0 2 r 3hB
Here, we use different prefix to get different data.
- first: use paste0 function to construct
- second: M1、M2、M3-------, still paste0 fucntion but more simple
- third: we use only M
- we did not change id and j
3. Conclusion
In reshape_toLong function:
- data: is the data that you want to transform
- id: is the unique id variable, which can be one variable or more
- j: is new variable name, that you want to stack the time or sequence number
- value.var.prefix: is the prefix of value variable