1

After transposing data I'd like to re-assign attributes that are dropped. This could also be applicable to copying attributes from one data frame to another. Or copying attributes after mutates, etc., where they are dropped.

 library(reshape2)

 df <- data.frame(id = c(1,2,3,4,5), 
                  time = c(11, 22,33,44,55),
                  c  = c(1,2,3,5,5),
                  d = c(4,2,5,4,NA))

attr(df$id,"label")<- "label"
attr(df$time,"label")<- "label2"
attr(df$c,"label")<- "something here"
attr(df$d,"label")<- "count of something"
str(df)

 str(df)
 data.frame':   5 obs. of  4 variables:
 $ id  : num  1 2 3 4 5
  ..- attr(*, "label")= chr "label"
 $ time: num  11 22 33 44 55
  ..- attr(*, "label")= chr "label2"
 $ c   : num  1 2 3 5 5
  ..- attr(*, "label")= chr "something here"
 $ d   : num  4 2 5 4 NA
  ..- attr(*, "label")= chr "count of something"

Cast to wide

dfwide<- recast(df,id~variable +time, 
            id.var = c("id","time"))

Usual attribute lost message:

   Warning message:
     attributes are not identical across measure variables; they will be dropped 

 str(dfwide)
'data.frame':   5 obs. of  11 variables:
 $ id  : num  1 2 3 4 5
 $ c_11: num  1 NA NA NA NA
 $ c_22: num  NA 2 NA NA NA
 $ c_33: num  NA NA 3 NA NA
 $ c_44: num  NA NA NA 5 NA
 $ c_55: num  NA NA NA NA 5
 $ d_11: num  4 NA NA NA NA
 $ d_22: num  NA 2 NA NA NA
 $ d_33: num  NA NA 5 NA NA
 $ d_44: num  NA NA NA 4 NA
 $ d_55: num  NA NA NA NA NA

Using mostattributes one can copy attributes between dataframes, but for iterations over many column names I can't figure out or think about how to map this efficiently in a different way save one by one.

 mostattributes(dfwide$c_11)<-attributes(df$c)
 mostattributes(dfwide$c_22)<-attributes(df$c)
 > str(dfwide)
 'data.frame':  5 obs. of  11 variables:
  $ id  : num  1 2 3 4 5
  $ c_11: num  1 NA NA NA NA
  ..- attr(*, "label")= chr "something here"
  $ c_22: num  NA 2 NA NA NA
  ..- attr(*, "label")= chr "something here"
  $ c_33: num  NA NA 3 NA NA

I was trying to automate it but failed (all c's should have same labels and d's have same labels):

#extract arguments
dlist<-enframe(names(df))%>%
   slice(-1,-2)%>%
   pull(., value)
 dlist

 dlistw<-enframe(names(dfwide))%>%
  slice(-1)%>%
  pull(., value)
 dlistw

#function
mostatt<- function(var1, var2) {
  mostattributes(dfwide[[var1]])<<-attributes(df[[var2]])
}

mapply(mostatt,dlistw,dlist)
str(dfwide)

'data.frame':   5 obs. of  11 variables:
 $ id  : num  1 2 3 4 5
 $ c_11: num  1 NA NA NA NA
  ..- attr(*, "label")= chr "something here"
 $ c_22: num  NA 2 NA NA NA
  ..- attr(*, "label")= chr "count of something"
 $ c_33: num  NA NA 3 NA NA
  ..- attr(*, "label")= chr "something here"
 $ c_44: num  NA NA NA 5 NA
  ..- attr(*, "label")= chr "count of something"
 $ c_55: num  NA NA NA NA 5
  ..- attr(*, "label")= chr "something here"
 $ d_11: num  4 NA NA NA NA
  ..- attr(*, "label")= chr "count of something"
 $ d_22: num  NA 2 NA NA NA
  ..- attr(*, "label")= chr "something here"
 $ d_33: num  NA NA 5 NA NA
  ..- attr(*, "label")= chr "count of something"
 $ d_44: num  NA NA NA 4 NA
  ..- attr(*, "label")= chr "something here"
 $ d_55: num  NA NA NA NA NA
  ..- attr(*, "label")= chr "count of something"

I think using tidyselect starts_with might be worth a try but not sure how to incorporate it. Any suggestions would be appreciated. Thank you!

23stacks1254
  • 369
  • 1
  • 9

1 Answers1

1

This is an option:

for(i in (setdiff(colnames(df), "id"))){
  for(x in colnames(dfwide)[(grepl(i, colnames(dfwide)))])
      mostattributes(dfwide[[x]]) <- attributes(df[[i]])
}
mostattributes(dfwide$id) <- attributes(df$id) 

Because d is contained in id I need to rewrite id at the end. If you change d for e is even simplier:

df <- data.frame(id = c(1,2,3,4,5), 
                 time = c(11, 22,33,44,55),
                 c  = c(1,2,3,5,5),
                 e = c(4,2,5,4,NA))


attr(df$id,"label")<- "label"
attr(df$time,"label")<- "label2"
attr(df$c,"label")<- "something here"
attr(df$e,"label")<- "count of something"
str(df)

dfwide<- recast(df,id~variable +time, 
                id.var = c("id","time"))

for(i in (colnames(df))){
  for(x in colnames(dfwide)[(grepl(i, colnames(dfwide)))])
    mostattributes(dfwide[[x]]) <- attributes(df[[i]])
}
LocoGris
  • 4,432
  • 3
  • 15
  • 30
  • 1
    That's a good option. Thank you for sharing. I didn't realize one could grep/grepl like that. I wonder if there is a way to add a regex carat ^ to the statement to signify the start position of the pattern. I'll play around with it. But as it is this solution works for me because the colnames in the dataframes I have are much longer and the grepl will work on those strings instead of a single character. Thank you again! – 23stacks1254 Mar 26 '19 at 17:41