0

I have a data frame in long format, that I'd like to transform into wide format. The data frame has several repeated identifiers that I'd like to treat as unique instances, and represent them as individual rows in the wide data frame.

My question is similar to this one:

Forcing unique values before casting (pivoting) in R

But in the above question the unique entries end up as individual columns. For my question, I would like to have the data put into individual rows. For example:

ID1<-c("A","A","A","A","A","B","B","B","B","B","C","C","C","C","C")

ID2<-c("R","R","R","L","L","R","R","L","L","R","R","L","L","R","R")

Sp<-c("Bird","Cat","Bird","Bird","Dog","Dog","Dog","Cat","Cat","Bird","Cat","Dog","Bird","Bird","Cat")

Count<-c(1,2,2,1,2,1,2,3,2,1,2,3,2,1,5)

DF<-data.frame(ID1,ID2,Sp,Count)

After I cast my data into wide format, I'd like the output data to look like this:

ID1    ID2    Bird  Cat  Dog
A      R       1     2    0
A      R       2     0    0 # 2 Birds in the A/ R combination so need second row (don't want to add them together)
A      L       1     0    2
B      R       1     0    1
B      R       0     0    2
B      L       0     3    0
B      L       0     2    0
C      R       1     2    0
C      R       0     5    0
C      L       2     0    3

If there are no repeats in the unique ID1/ ID2 combination, the cast wouild work as normal. But if there is a repeat, a second (or third or fourth) row would be created.

Community
  • 1
  • 1
Vinterwoo
  • 3,843
  • 6
  • 36
  • 55

1 Answers1

1

You can create an auxiliary ID column per group of ID1, ID2 and Sp, and then reshape with ID1, ID2 and AUXID as id columns:

library(dplyr)
DF = DF %>% group_by(ID1, ID2, Sp) %>% mutate(AUXID = row_number()) %>% as.data.frame()
reshape(DF, idvar = c("ID1", "ID2", "AUXID"), timevar = "Sp", dir = "wide")

#    ID1 ID2 AUXID Count.Bird Count.Cat Count.Dog
# 1    A   R     1          1         2        NA
# 3    A   R     2          2        NA        NA
# 4    A   L     1          1        NA         2
# 6    B   R     1          1        NA         1
# 7    B   R     2         NA        NA         2
# 8    B   L     1         NA         3        NA
# 9    B   L     2         NA         2        NA
# 11   C   R     1          1         2        NA
# 12   C   L     1          2        NA         3
# 15   C   R     2         NA         5        NA

You can drop the AUXID column and fill NA afterwards.

Here is a data.table version with dcast() which provides a fill parameter to fill NA values:

library(data.table)
(dcast(setDT(DF)[, AUXID := 1:.N, .(ID1, ID2, Sp)], 
      ID1 + ID2 + AUXID ~ Sp, value.var = "Count", fill = 0)
      [, AUXID := NULL][])
#    ID1 ID2 Bird Cat Dog
# 1:   A   L    1   0   2
# 2:   A   R    1   2   0
# 3:   A   R    2   0   0
# 4:   B   L    0   3   0
# 5:   B   L    0   2   0
# 6:   B   R    1   0   1
# 7:   B   R    0   0   2
# 8:   C   L    2   0   3
# 9:   C   R    1   2   0
#10:   C   R    0   5   0
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • 2
    For your `dplyr` method, instead of base `reshape` you can add this to the end of the dplyr chain (after loading `reshape2`): `dcast(ID1 + ID2 + AUXID ~ Sp, value.var="Count", fill=0) %>% select(-AUXID)` – eipi10 Oct 28 '16 at 04:26