2

I have data of this wide format that I want to convert to long format

    Cond    Construct   Line    Plant   Tube_shoot  weight_shoot    Tube_root   weight_root
1   Standard            NA      NA      2           199.95          -           -
2   Cd0     IIF         43.1    1       3           51.87           4           10.39
3   Cd0     IIF         43.1    2       5           81.80           6           15.05
4   Cd0     IIF         43.1    3       7           101.56          8           16.70

What I basically want is to keep Tube_shoot and weight_shoot together, i.e. treat these two columns as one for melt. But because I can only use

id.vars=c("Cond","Construct","Line","Plant")

the result is not quite what I want.

I have two (ugly) solutions so far:

  1. I melt twice, first by measure.vars=c("Tube_shoot", "Tube_root" ), then by weights, and later remove half of the rows where the result is simply wrong. This is not feasible for me because I have different lengths of data, and I'd always have to check if I'm taking the right rows out.

  2. I paste "tube" with "weight" into a new column, take out the others, melt them, and later take them apart again.

  3. Copy them one by one in excel. But with hundreds of lines I'd rather learn how to do it in R.

I'm sure that there is a better way.

What I want in the end:

    Cond    Construct   Line    Plant   Tube        weight
1   Standard            NA      NA      2           199.95
2   Cd0     IIF         43.1    1       3           51.87
3   Cd0     IIF         43.1    2       5           81.80
4   Cd0     IIF         43.1    3       7           101.56
2   Cd0     IIF         43.1    1       4           10.39
3   Cd0     IIF         43.1    2       6           15.05
4   Cd0     IIF         43.1    3       8           16.70
riselin
  • 23
  • 1
  • 4

3 Answers3

2

You could try

 res <- reshape(df1, idvar=c('Cond', 'Construct', 'Line', 'Plant'),
              varying=5:8, direction='long', sep="_")

 res1 <-  res[res$weight!='-', -5]
 row.names(res1) <- NULL

 res1
 #      Cond Construct Line Plant Tube weight_shoot
 #1 Standard             NA    NA    2       199.95
 #2      Cd0       IIF 43.1     1    3        51.87
 #3      Cd0       IIF 43.1     2    5         81.8
 #4      Cd0       IIF 43.1     3    7       101.56
 #5      Cd0       IIF 43.1     1    4        10.39
 #6      Cd0       IIF 43.1     2    6        15.05
 #7      Cd0       IIF 43.1     3    8        16.70

data

 df1 <- structure(list(Cond = c("Standard", "Cd0", "Cd0", "Cd0"), 
  Construct = c("", "IIF", "IIF", "IIF"), Line = c(NA, 43.1, 43.1, 43.1),
  Plant = c(NA, 1L, 2L, 3L), Tube_shoot = c(2L, 3L, 5L, 7L), weight_shoot = 
  c(199.95,51.87, 81.8, 101.56), Tube_root = c("-", "4", "6", "8"), 
  weight_root = c("-", "10.39", "15.05", "16.70")), .Names = c("Cond",
  "Construct", "Line", "Plant", "Tube_shoot", "weight_shoot", "Tube_root",
  "weight_root"), class = "data.frame", row.names = c("1", "2", "3", "4"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you, this is absolutely perfect. I did not try reshape as I was following this link: http://www.cookbook-r.com/Manipulating_data/Converting_data_between_wide_and_long_format/ – riselin Nov 25 '14 at 11:47
  • @Raphael Iselin Glad to know that it works. `reshape` is a `base R` function which may be bit tricky to use sometimes, but it works better than `dcast` for multiple columns. For example check this link http://stackoverflow.com/questions/27118314/casting-multiple-columns-from-one-factor-variable/27118450#27118450 – akrun Nov 25 '14 at 12:05
  • @akrun, `dcast.data.table` will be able to work with such data from version 1.9.8 on. – A5C1D2H2I1M1N2O1R2T1 Nov 25 '14 at 12:36
1

You may want to consider merged.stack from my "splitstackshape" package, with which you can do something like:

library(splitstackshape)
merged.stack(as.data.table(df1, keep.rownames = TRUE), 
             var.stubs = c("Tube", "weight"), sep = "_")
#    rn     Cond Construct Line Plant .time_1 Tube weight
# 1:  1 Standard             NA    NA    root    -      -
# 2:  1 Standard             NA    NA   shoot    2 199.95
# 3:  2      Cd0       IIF 43.1     1    root    4  10.39
# 4:  2      Cd0       IIF 43.1     1   shoot    3  51.87
# 5:  3      Cd0       IIF 43.1     2    root    6  15.05
# 6:  3      Cd0       IIF 43.1     2   shoot    5   81.8
# 7:  4      Cd0       IIF 43.1     3    root    8  16.70
# 8:  4      Cd0       IIF 43.1     3   shoot    7 101.56

Of course, you can also add a [Tube != "-" | weight != "-"] to the end to remove the rows where "Tube" or "weight" have "-"... but note that doing so doesn't magically convert those columns to numeric :-)

A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • I wonder if it would be easier if `merged.stack` would convert data frames to `data.table` by default, unless it can work with data frames too? – David Arenburg Nov 25 '14 at 11:50
  • @DavidArenburg, it does automatically convert to `data.table`s. I did the `as.data.table` because I wanted the rownames for ordering if required later. – A5C1D2H2I1M1N2O1R2T1 Nov 25 '14 at 11:50
  • Oh, ok then, nice solution – David Arenburg Nov 25 '14 at 11:52
  • This solution does not work for me on my original data, and even with the df1 provided by akrun it does not. I had to add id.vars=c('Cond', 'Construct', 'Line', 'Plant') for this. – riselin Nov 25 '14 at 12:12
  • @riselin, what version of "splitstackshape" are you using? In version 1.4 and above, it tries to guess the "`id.vars`". – A5C1D2H2I1M1N2O1R2T1 Nov 25 '14 at 12:25
  • just checked, it's version 1.2 (edit: I'm using R studio on a mac, and that's what I downloaded with install.packages("splitstackshape") half an hour ago. Also, the auto-update button claims all are up to date) – riselin Nov 25 '14 at 12:38
  • @riselin, you may have to install it from source. I don't have a Mac, so I can't confirm. – A5C1D2H2I1M1N2O1R2T1 Nov 25 '14 at 14:12
1

Another option, using dplyr and tidyr:

library(dplyr)
libarary(tidyr)

gather(df1, x, Tube, c(Tube_shoot, Tube_root)) %>% 
   mutate(weight = ifelse(grepl("*root$", x), weight_root, weight_shoot)) %>%
   select(-c(weight_shoot, weight_root, x))

#      Cond Construct Line Plant Tube weight
#1 Standard             NA    NA    2 199.95
#2      Cd0       IIF 43.1     1    3  51.87
#3      Cd0       IIF 43.1     2    5   81.8
#4      Cd0       IIF 43.1     3    7 101.56
#5 Standard             NA    NA    -      -
#6      Cd0       IIF 43.1     1    4  10.39
#7      Cd0       IIF 43.1     2    6  15.05
#8      Cd0       IIF 43.1     3    8  16.70
talat
  • 68,970
  • 21
  • 126
  • 157