0

I have a data frame like that:

    HLA_Status    variable      value
1     PP            CCL24       9.645
2     PP            CCL24       56.32
3     PP            CCL24       7.268
4     PC            CCL24       5.698
5     PC            CCL24       89.457
6     PC            CCL24       78.23
7     PP            SPP1        23.12
8     PP            SPP1        36.32
9     PP            SPP1        17.268
10    PC            SPP1        2.698
11    PC            SPP1        9.457
12    PC            SPP1        8.23

I want to reshape my data frame with reshape2::dcast() and get this:

   HLA_Status        CCL24        SPP1
1      PP            9.645       23.12
2      PP            56.32       36.32
3      PP            7.268       17.268
13     PC            5.698       2.698
14     PC            89.457      9.457
15     PC            78.230      8.23

But I didn't manage to do this.

I tried this:

dcast(mydt, HLA_Status ~ variable, value.var = "value")

But it didn't work.

And I see on the documentation of reshape2 that if we have multiple values per cell we need to tell dcast how to aggregate the data.

I think my problem is to don't know what to give to fun.aggregate.

How can I get the wanted data frame by using reshape2 or others packages?

Adam Bellaïche
  • 427
  • 3
  • 16

4 Answers4

2

We can use spread from tidyr

library(dplyr)
library(tidyr)

df %>%
  group_by(HLA_Status, variable) %>%
  mutate(row = row_number()) %>%
  spread(variable, value) %>%
  ungroup() %>%
  select(-row)

# A tibble: 6 x 3
#  HLA_Status CCL24  SPP1
#  <fct>     <dbl> <dbl>
#1   PC       5.70  2.70
#2   PC       89.5  9.46
#3   PC       78.2  8.23
#4   PP       9.64  23.1 
#5   PP       56.3  36.3 
#6   PP       7.27  17.3 

data

df <- structure(list(HLA_Status = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L), .Label = c("PC", "PP"), class = "factor"), 
variable = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 
2L, 2L, 2L), .Label = c("CCL24", "SPP1"), class = "factor"), 
value = c(9.645, 56.32, 7.268, 5.698, 89.457, 78.23, 23.12, 
36.32, 17.268, 2.698, 9.457, 8.23)), class = "data.frame", row.names = 
c(NA, -12L))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
2

This can be done with dcast (here from data.table) though you need a row identifier.

library(data.table)
dcast(dt, HLA_Status + rowid(HLA_Status, variable) ~ variable)
#   HLA_Status HLA_Status_1  CCL24   SPP1
#1:         PC            1  5.698  2.698
#2:         PC            2 89.457  9.457
#3:         PC            3 78.230  8.230
#4:         PP            1  9.645 23.120
#5:         PP            2 56.320 36.320
#6:         PP            3  7.268 17.268

data

dt <- fread("    HLA_Status    variable      value
     PP            CCL24       9.645
     PP            CCL24       56.32
     PP            CCL24       7.268
     PC            CCL24       5.698
     PC            CCL24       89.457
     PC            CCL24       78.23
     PP            SPP1        23.12
     PP            SPP1        36.32
     PP            SPP1        17.268
     PC            SPP1        2.698
     PC            SPP1        9.457
     PC            SPP1        8.23")
markus
  • 25,843
  • 5
  • 39
  • 58
1

If reshape2::dcast is actually needed, there's a solution with ave (to get the identifiers, see @markus' answer):

reshape2::dcast(d, HLA_Status + ave(rep(1, nrow(d)), d[1:2], FUN=seq) ~ variable)
#   HLA_Status ave(rep(1, nrow(d)), d[1:2], FUN = seq)  CCL24   SPP1
# 1         PC                                       1  5.698  2.698
# 2         PC                                       2 89.457  9.457
# 3         PC                                       3 78.230  8.230
# 4         PP                                       1  9.645 23.120
# 5         PP                                       2 56.320 36.320
# 6         PP                                       3  7.268 17.268

Data

d <- structure(list(HLA_Status = c("PP", "PP", "PP", "PC", "PC", "PC", 
"PP", "PP", "PP", "PC", "PC", "PC"), variable = c("CCL24", "CCL24", 
"CCL24", "CCL24", "CCL24", "CCL24", "SPP1", "SPP1", "SPP1", "SPP1", 
"SPP1", "SPP1"), value = c(9.645, 56.32, 7.268, 5.698, 89.457, 
78.23, 23.12, 36.32, 17.268, 2.698, 9.457, 8.23)), row.names = c(NA, 
-12L), class = "data.frame")
jay.sf
  • 60,139
  • 8
  • 53
  • 110
0

I highly recommend to switch to tidyr instead of using reshape2. However, if you really would like to use dcast this is the way to go

library(dplyr)
library(reshape2)
df <- structure(list(HLA_Status = c("PP", "PP", "PP", "PC", "PC", "PC", 
"PP", "PP", "PP", "PC", "PC", "PC"), variable = c("CCL24", "CCL24", 
"CCL24", "CCL24", "CCL24", "CCL24", "SPP1", "SPP1", "SPP1", "SPP1", 
"SPP1", "SPP1"), value = c(9.645, 56.32, 7.268, 5.698, 89.457, 
78.23, 23.12, 36.32, 17.268, 2.698, 9.457, 8.23)), row.names = c(NA, 
-12L), class = "data.frame")


df %>% 
  group_by(variable, HLA_Status) %>%
  mutate(id = row_number()) %>% 
  dcast(HLA_Status+id ~ variable, value.var = "value") %>%
  select(-id)

  HLA_Status  CCL24   SPP1
1         PC  5.698  2.698
2         PC 89.457  9.457
3         PC 78.230  8.230
4         PP  9.645 23.120
5         PP 56.320 36.320
6         PP  7.268 17.268
MrNetherlands
  • 920
  • 7
  • 14