4

I am trying to renumber groups of people.

In the data, 'FamID' indicates a family, 'PtID' indicates an individual patient that relates to the family. The 'Twin' column indicates whether the patients are identical twins (coded as 1), non-identical twins (coded as 2) or not twins (coded as 0).

  FamID    PtID    Twin    
  F1       F11     1
  F1       F12     1
  F2       F21     2
  F2       F22     2
  F3       F31     1
  F3       F32     1
  F4       F41     2 
  F5       F51     1  
  F5       F52     1 
  F5       F53     0
  F6       F61     1
  F6       F62     1
  F7       F71     2
  F7       F72     2

So for example, 'FamID' F1 has two family members, PtID F11 and F12, who are identical twins (Twin = 1).

I want to create a column (NewCol) that has a coding based on the Twin column and the FamID column.

The first set of identical twins in the Twin column (coded as 1) would have a 1 in the new column and the second set of identical twins from a different family would be 3, where the following set of identical twins would be the next odd number and so on.

For the non-identical twins (coded as 2s) they would go up incrementally in even numbers with the first family of non-identical twins starting at 2 and going up.

Any non twins (coded as 0s), they would remain 0.

Desired output:

  FamID   PtID     Twin     NewCol
  F1       F11     1        1
  F1       F12     1        1
  F2       F21     2        2
  F2       F22     2        2
  F3       F31     1        3
  F3       F32     1        3
  F4       F41     2        4
  F5       F51     1        5  
  F5       F52     1        5 
  F5       F53     0        0
  F6       F61     1        7
  F6       F62     1        7 
  F7       F71     2        6
  F7       F72     2        6

Data

 FamID <- c(rep("F1", 2), rep("F2", 2), rep("F3", 2), "F4", rep("F5", 3), rep("F6", 2), rep("F7", 2)) 
 PtID <- c("F11", "F12", "F21", "F22", "F31", "F32", "F41", "F51", "F52", "F53", "F61", "F62", "F71", "F72")
 Twin <- c(1, 1, 2, 2, 1, 1, 2, 1, 1, 0, 1, 1, 2, 2)
 sample <- data.frame(FamID, PtID, Twin)
Henrik
  • 65,555
  • 14
  • 143
  • 159
Sheila
  • 2,438
  • 7
  • 28
  • 37
  • can there be more than one pair of twins in the same family? how would you deal with them? – flodel May 16 '13 at 23:22
  • It matters more about the Twins coding. Lets say there are two sets of twins in the same family, one set is identical and the other is non-identical. The first two would be coded as 1s and the second would be coded as 2s. – Sheila May 16 '13 at 23:35

2 Answers2

4

Here's a solution using the data.table package:

 dt <- data.table(sample)

 dt[Twin == 0, NewCol := 0L]
 dt[Twin == 1, NewCol := .GRP * 2L - 1L, by = FamID]
 dt[Twin == 2, NewCol := .GRP * 2L, by = FamID]

The result is

#      FamID PtID Twin NewCol
#  1:    F1  F11    1      1
#  2:    F1  F12    1      1
#  3:    F2  F21    2      2
#  4:    F2  F22    2      2
#  5:    F3  F31    1      3
#  6:    F3  F32    1      3
#  7:    F4  F41    2      4
#  8:    F5  F51    1      5
#  9:    F5  F52    1      5
# 10:    F5  F53    0      0
# 11:    F6  F61    1      7
# 12:    F6  F62    1      7
# 13:    F7  F71    2      6
# 14:    F7  F72    2      6

Data.tables have several benefits (intuitive syntax, efficiency in many operations) and behave exactly like data.frames when used with most functions. However, you can convert back to a data.frame using

df <- as.data.frame(dt)
Henrik
  • 65,555
  • 14
  • 143
  • 159
Frank
  • 66,179
  • 8
  • 96
  • 180
  • It's not that they work the same for all operations, but rather can be used by most functions that require data.frames. They in fact work awfully differently. – Ricardo Saporta May 16 '13 at 23:20
  • +1 very nice. No need for the `i` in `dt[Twin==0,NewCol:=0L]` you can just use `dt[ ,NewCol:=0L]` – Ricardo Saporta May 16 '13 at 23:21
  • Thanks! Yeah, I just spelled out the Twin==0 case so that it's very clear. Hrm, I want to evangelize for data.tables but maybe I should just say "You can convert it back..." in case I say something inaccurate. – Frank May 16 '13 at 23:27
  • 1
    Sure if you want to, but honestly, I dont think such verbiage is necessary. Mostly because once someone gets the hang of `data.table`, there are only very rare occasions when they would actually *need* to use `data.frame`, and chances are by then they will already know how to convert – Ricardo Saporta May 16 '13 at 23:34
  • Both solutions were great! Thanks for your help guys! – Sheila May 16 '13 at 23:37
4

Using factors and data.table

library(data.table)
DT.Sample <- data.table(sample)

DT.Sample[ , NewCol := 0]   

DT.Sample[Twin==1 , NewCol:= 2*as.numeric(factor(FamID))-1]
DT.Sample[Twin==2 , NewCol:= 2*as.numeric(factor(FamID))]

    FamID PtID Twin NewCol
 1:    F1  F11    1      1
 2:    F1  F12    1      1
 3:    F2  F21    2      2
 4:    F2  F22    2      2
 5:    F3  F31    1      3
 6:    F3  F32    1      3
 7:    F4  F41    2      4
 8:    F5  F51    1      5
 9:    F5  F52    1      5
10:    F5  F53    0      0
11:    F6  F61    1      7
12:    F6  F62    1      7
13:    F7  F71    2      6
14:    F7  F72    2      6
Ricardo Saporta
  • 54,400
  • 17
  • 144
  • 178