-3

I am dealing with a dataset which is in a long format as shown below.

  id      Name                 FluidTotal
  60718   Rocuronium           11.069175
  07860   Rocuronium           5.967500
  90389   Rocuronium           200.000000
  01536   Rocuronium           78.311333 
  01536   Rocuronium           51.415600 
  28047   Rocuronium           72.944444
  28047   Acid                 1
  28047   Acid                 1
  28047   Acid                 1
  28047   Acid                 1
  92323   Acid                 1
  92323   Void                 100
  01536   Void                 25
  60718   Void                 70
  60718   Void                 40  

What I am trying to do is transform this to a wide format like this below

   Id      Rocuronium   Acid    Void
   60718   11.069175    -       110 
   07860   5.967500     -       -                                         
   90389   200.000000   -       -
   01536   129.72693    -       25
   28047   72.944444    4       -
   92323   -            1       100 

The first row is 60718 11.069175 - 110 , the cell value for Void is 110 because the last two rows for id 60718 in the long format are 70 and 40 so when they are converted to wide format these two values are added.

Similarly the value of Rocuronium for id 01536 in wide format is 129.72693 because rows 4 and 5 in the long format for this id 01536 are 78.311333 and 51.415600

I'd appreciate any assistance on accomplishing this transformation, thanks in advance..

Jill Sellum
  • 319
  • 2
  • 4
  • 14
  • 1
    Essentially the same as here - http://stackoverflow.com/questions/35424064/converting-long-to-wide-format/35424289 - just replace `min` with `sum` instead - `reshape(aggregate(FluidTotal ~ ., data=dat, FUN=sum), idvar="id", timevar="Name", direction="wide")` – thelatemail Apr 08 '16 at 01:16
  • Or, you can use `tidyr` package's `spread` function. – Gopala Apr 08 '16 at 01:22
  • `reshape(aggregate( ....))` seems a prescription for a one time success and later failure for a new R user. Most experienced R users still have trouble with `reshape`. – IRTFM Apr 08 '16 at 02:47
  • @42- I really don't understand the widespread troubles with `reshape`. Anyway, `tapply` is a neat solution too, but it also loses the `id` column to a `row.name`, which will probably cause grief for a beginner too. – thelatemail Apr 08 '16 at 03:03
  • Perhaps you are the unique person who can read the `reshape` help page and figure out which parameters to use for the wide to long transformation and which parameters to use for the long to wide. I usually give up after my 4th or 5th failure. – IRTFM Apr 08 '16 at 03:06

2 Answers2

5

The base function tapply function lets you do operations within cross-classified categories:

with( dat, tapply( FluidTotal, list(id,Name), sum))

      Acid Rocuronium Void
1536    NA  129.72693   25
7860    NA    5.96750   NA
28047    4   72.94444   NA
60718   NA   11.06917  110
90389   NA  200.00000   NA
92323    1         NA  100

In describing transformations, the act of collapsing individual rows is different than just going from long to wide. It should probably be noted that this retrunes an R-matrix object and if it needed to be in the form of a dataframe you could use data.frame to coerce it.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
  • Did not know if you saw it. Did not want to silently delete without acknowledging. I will delete as it is not relevant to the topic. Thank you. – Gopala Apr 08 '16 at 02:16
  • @42- , latemail, and Gopala, that is so neat. I will test other solutions as well – Jill Sellum Apr 08 '16 at 03:41
2

We can use xtabs (another base R function). By default, xtabs aggregate the sum.

xtabs(FluidTotal~., df1)
#         Name
#id          Acid Rocuronium      Void
# 1536    0.00000  129.72693  25.00000
# 7860    0.00000    5.96750   0.00000
# 28047   4.00000   72.94444   0.00000
# 60718   0.00000   11.06917 110.00000
# 90389   0.00000  200.00000   0.00000
# 92323   1.00000    0.00000 100.00000
akrun
  • 874,273
  • 37
  • 540
  • 662