2

I'm transforming my data from long to wide. Part of the data are dates. My problem is that I would like to have other colnames. It is formed like eg variable_1-1 and I want 1-1_variable.

df:

    SN specimen_isolate_no isolaat materiaal_lokatie alarmniveau afnamedatum
 1:  2                 1-1  STAPEP Bloedkweek  Bloed       0      2017-04-30
 2:  3                 1-1  KLEBOX      Bloedkweek         0      2018-12-30
 3:  3                 2-1  KLEBOX      Bloedkweek         0      2018-12-31

I tried dcast from data.table:

setDT(df) 
df.wide <- dcast(df, SN ~ specimen_isolate_no, value.var = c("materiaal_lokatie","afnamedatum", "isolaat", "alarmniveau" ))

Which give me the following result:

colnames: 
[1] "SN"                    "materiaal_lokatie_1-1" "materiaal_lokatie_2-1" 
 "afnamedatum_1-1"            "afnamedatum_2-1"        "isolaat_1-1"          
      "isolaat_2-1"                  "alarmniveau_1-1"    "alarmniveau_2-1"       

This result is ok, but I rather have the colnames formed like specimen_isolate_no_variable, eg 1-1_alarmniveau.

In order to achieve this, I tried

molten <- melt(df, id.vars = c("SN", "specimen_isolate_no"))
dfmolton <- dcast(molten, SN ~ specimen_isolate_no + variable)

#and 

 df %>% 
     gather(key, value, -SN, -specimen_isolate_no) %>%  
     unite(new.col, c(specimen_isolate_no,key )) %>%   
     spread(new.col, value) 

But both options mess up my dates and I don't know how to fix that.

 #colnames:
 [1] "SN"                    "1-1_isolaat"           "1-1_materiaal_lokatie" "1-1_alarmniveau"       "1-1_afnamedatum"       "2-1_isolaat"           "2-1_materiaal_lokatie" "2-1_alarmniveau"      "2-1_afnamedatum"   

dfmolten$`1-1_afnamedatum`
[1] "17286" "17895"

So my question: does anyone how to change the forming of colnames using dcast?

AvdH
  • 39
  • 1
  • 6
  • 1
    I don't think you want to change the forming using `dcast` directly (though you haven't stated exactly what form you'd like the names to take -- perhaps `sep` argument is useful? though I gather not)... You can try either (1) overwriting `specimen_isolate_no` before casting or (2) using `setnames` & some `grep`/`gsub` magic to clean up afterwards – MichaelChirico May 08 '19 at 16:15
  • 1
    @MichaelChirico: I'm not sure I understand. I would like 1-1_variable instead of variable_1-1. – AvdH May 08 '19 at 16:33
  • 2
    There's no functionality for it right now, unfortunately. An earlier discussion is here: https://github.com/Rdatatable/data.table/issues/1951 @MichaelChirico – Frank May 08 '19 at 16:50
  • 1
    I don't think this is currently possible to do this directly with dcast. You could use a function like setnames() to change columns names after using dcast(). You should drop a comment on the data.table GitHub, maybe they will add this feature in a future update. – Gainz May 08 '19 at 17:22
  • 1
    Ok, thank you all! :) – AvdH May 08 '19 at 18:15

1 Answers1

2

As Frank mentioned, there's an outstanding feature request for this... side note: please add reactions to FRs you'd like, we use this to some extent to steer development time:

https://github.com/Rdatatable/data.table/issues/3189

In the meantime, you can just use setnames and some regexing to do this:

old = grep('SN', names(df.wide), value = TRUE, invert = TRUE, fixed = TRUE)
new = sapply(strsplit(old, '_', fixed = TRUE), function(x) paste(rev(x), collapse = '_'))
setnames(df.wide, old, new)
MichaelChirico
  • 33,841
  • 14
  • 113
  • 198