Df from long to wide format with multiple id and time variables

Question

I am stuck trying to convert a df from long format to wide format. My data looks like this for one id (example 1102; in the complete df there are many different ids:

>df_long[1:30,]
id      roi             mean       sd     timepoint
1  1102 HarvardOxford1 0.234243 0.1212333    A
2  1102 HarvardOxford2 0.111111 0.1212333    A
3  1102 HarvardOxford3 0.234243 0.1212333    A
4  1102 HarvardOxford4 0.111111 0.4657222    A
5  1102 Lh__F1_head 0.234243 0.1212333        A
6  1102 Lh__F2_body 0.534543 0.4657222        A
7  1102 Lh__F3_tail 0.234243 0.1212333        A
8  1102 Rh__F1_head 0.111111 0.4657222        A
9  1102 Rh__F2body 0.234243 0.1212333         A
10 1102 Rh__F3tail 0.111111 0.4657222         A
11 1102 HarvardOxford1 0.111111 0.1212333    B
12 1102 HarvardOxford2 0.234243 0.4657222    B
13 1102 HarvardOxford3 0.111111 0.4657222    B
14 1102 HarvardOxford4 0.111111 0.1212333    B
15 1102 Lh__F1_head 0.534543 0.4657222        B
16 1102 Lh__F2_body 0.234243 0.4657222        B
17 1102 Lh__F3_tail 0.234243 0.1212333        B
18 1102 Rh__F1_head 0.534543 0.4657222        B
19 1102 Rh__F2body 0.534543 0.4657222         B
20 1102 Rh__F3tail 0.234243 0.1212333         B
21 1102 HarvardOxford1 0.534543 0.4657222    C
22 1102 HarvardOxford2 0.234243 0.4657222    C
23 1102 HarvardOxford3 0.387666 0.4657222    C
24 1102 HarvardOxford4 0.234243 0.4657222    C
25 1102 Lh__F1_head 0.387666 0.4657222        C
26 1102 Lh__F2_body 0.387666 0.4657222        C
27 1102 Lh__F3_tail 0.387666 0.1212333        C
28 1102 Rh__F1_head 0.387666 0.4657222        C
29 1102 Rh__F2body 0.234243 0.1212333         C
30 1102 Rh__F3tail 0.234243 0.1212333         C

I want it in the wide format so that for each id there is only one row. So something like:

id      roi.A          mean.A  sd.A       roi.A       mean.A.   sd.A ... roi.C 
1 1102 Harvard.Oxford1 0.2342 0.12123 HarvardOxford2 0.1111 0.12123 ... Harvard

Any ideas?

In your example you end up with multiple columns with the same name. Are you sure this is what you want? — rw2, Jan 24 '22 at 13:44

score 0 · Answer 1 · answered Jan 24 '22 at 14:20

Enjoy, however it doesn't make sense to me. May I ask what's the purpose to have a very looong table with similar column names?

Lines <- 
"id roi mean sd timepoint
1102 HarvardOxford1 0.234243 0.1212333 A
1102 HarvardOxford2 0.111111 0.1212333 A
1102 HarvardOxford3 0.234243 0.1212333 A
1102 HarvardOxford4 0.111111 0.4657222 A
1102 Lh__F1_head 0.234243 0.1212333 A
1102 Lh__F2_body 0.534543 0.4657222 A
1102 Lh__F3_tail 0.234243 0.1212333 A
1102 Rh__F1_head 0.111111 0.4657222 A
1102 Rh__F2body 0.234243 0.1212333 A
1102 Rh__F3tail 0.111111 0.4657222 A
1102 HarvardOxford1 0.111111 0.1212333 B
1102 HarvardOxford2 0.234243 0.4657222 B
1102 HarvardOxford3 0.111111 0.4657222 B
"
a<- read.csv(textConnection(Lines), sep=" ", as.is = TRUE, skip = 0)

df <-data.frame()
df<-cbind(id = a[1,1])
for(m in 1:nrow(a)) {
  for (n in 2:4) {
    b<- as.data.frame(a[m,n])
    b<- setNames(b, paste0(names(a[n]),".",a[m,"timepoint"]))
    df<-cbind(df, b)
  }
}

df
#>     id          roi.A   mean.A      sd.A          roi.A   mean.A      sd.A
#> 1 1102 HarvardOxford1 0.234243 0.1212333 HarvardOxford2 0.111111 0.1212333
#>            roi.A   mean.A      sd.A          roi.A   mean.A      sd.A
#> 1 HarvardOxford3 0.234243 0.1212333 HarvardOxford4 0.111111 0.4657222
#>         roi.A   mean.A      sd.A       roi.A   mean.A      sd.A       roi.A
#> 1 Lh__F1_head 0.234243 0.1212333 Lh__F2_body 0.534543 0.4657222 Lh__F3_tail
#>     mean.A      sd.A       roi.A   mean.A      sd.A      roi.A   mean.A
#> 1 0.234243 0.1212333 Rh__F1_head 0.111111 0.4657222 Rh__F2body 0.234243
#>        sd.A      roi.A   mean.A      sd.A          roi.B   mean.B      sd.B
#> 1 0.1212333 Rh__F3tail 0.111111 0.4657222 HarvardOxford1 0.111111 0.1212333
#>            roi.B   mean.B      sd.B          roi.B   mean.B      sd.B
#> 1 HarvardOxford2 0.234243 0.4657222 HarvardOxford3 0.111111 0.4657222

^{Created on 2022-01-24 by the reprex package (v2.0.1)}

Df from long to wide format with multiple id and time variables

1 Answers1