Restructuing and formatting data frame columns

Question

dfin <- 

ID   SEQ   GRP   C1   C2   C3   T1   T2   T3
1     1     1    0     5    8   0     1   2
1     2     1    5     10   15  5     6   7
2     1     2    20    25   30  0     1   2

C1 is the concentration (CONC) at T1 (TIME) and so on. This is what I want as an output:

dfout <- 

ID   SEQ   GRP  CONC  TIME
1     1     1    0     0
1     1     1    5     1
1     1     1    8     2
1     2     1    5     5
1     2     1    10    6
1     2     1    15    7
2     1     2    20    0
2     1     2    25    1
2     1     2    30    2

The dfin has much more columns for Cx and Tx where x is the number of concentration readings.

Psidom · Accepted Answer · 2017-08-03T01:03:22.690

3

You can do this with data.table::melt, with its capability of melting the table into multiple columns based on the columns pattern:

library(data.table)
melt(
    setDT(df), 
    id.vars=c("ID", "SEQ", "GRP"), 
    # columns starts with C and T should be melted into two separate columns
    measure.vars=patterns("^C", "^T"),     
    value.name=c('CONC', 'TIME')
)[order(ID, SEQ)][, variable := NULL][]

#   ID SEQ GRP CONC TIME
#1:  1   1   1    0    0
#2:  1   1   1    5    1
#3:  1   1   1    8    2
#4:  1   2   1    5    5
#5:  1   2   1   10    6
#6:  1   2   1   15    7
#7:  2   1   2   20    0
#8:  2   1   2   25    1
#9:  2   1   2   30    2

Or if the value column names follow the pattern [CT][0-9], you can use reshape from base R by specifying the sep="" which will split the value columns name by the letter/digit separation due to this default setting (from ?reshape):

split = if (sep == "") {
            list(regexp = "[A-Za-z][0-9]", include = TRUE)
        } else {
            list(regexp = sep, include = FALSE, fixed = TRUE)}

reshape(df, varying=-(1:3), idvar=c("ID", "SEQ", "GRP"), 
        dir="long", sep="", v.names=c("CONC", "TIME"))

#   ID SEQ GRP time CONC TIME
#1:  1   1   1    1    0    5
#2:  1   2   1    1    5   10
#3:  2   1   2    1   20   25
#4:  1   1   1    2    8    0
#5:  1   2   1    2   15    5
#6:  2   1   2    2   30    0
#7:  1   1   1    3    1    2
#8:  1   2   1    3    6    7
#9:  2   1   2    3    1    2

edited Aug 03 '17 at 01:03

answered Aug 03 '17 at 00:52

Psidom

209,562
33
339
356

Thank you. I applied data.table method to my whole dataframe but got this warning `Warning message: In melt.data.table(setDT(data1), id.vars = c("SUB", "SEQ", "PER", : 'measure.vars' [TRT, T1, T2, T3, ...] are not all of the same type. By order of hierarchy, the molten data value column will be of type 'double'. All measure variables not of type 'double' will be coerced to. Check DETAILS in ?melt.data.table for more on coercion.` is this a critical warning here? – daragh Aug 03 '17 at 01:10
You need to check the data types of all the columns of your data frame by `sapply(df, class)`. And make sure they are what you expected. The warning means some of the columns are not of numeric type, and if coerced to numeric type, you may get unexpected NAs. – Psidom Aug 03 '17 at 01:15
Thanks for that. I found where the problem is. I had another column in the data frame called `TRT` (starts with T) and is an interger. The `TRT` column is one of the `id.vars` in the data frame. Is there a way to avoid other than renaming this column? – daragh Aug 03 '17 at 01:33
If you don't care about losing track of which C or which T the value comes from, i.e, the `x` value after C or T, then you can change the pattern to `^C\\d` and `^T\\d`, assuming `Cx` or `Tx`, x is always a number. – Psidom Aug 03 '17 at 01:38
I do care about where the value is coming from because C1 should be associated with T1 and so on. So may be renaming that column better? in my case, yes, x is always a number. – daragh Aug 03 '17 at 01:43
That's what I would do. Rename it before reshaping and change it back later. – Psidom Aug 03 '17 at 01:46
1

I ran the code with what you have suggested `^C\\d` and `^T\\d` and gave similar output the one posted in the answer. So, it should be fine then. – daragh Aug 03 '17 at 01:47

Restructuing and formatting data frame columns

1 Answers1