0

I'm pretty new to R and I'm trying to create some new variables. Basically my dataset has individuals with a variable for mother ID (i.e. if two individuals have the same mother the value of this variable will be the same).

Keeping it simple to begin with, lets say I want to create a dummy variable that = 1 if two individuals are siblings. I tried using:

    dummy <- as.numeric(duplicated(Identifiers_age$MPUBID) = TRUE)

but the vector I get only = 1 for one of the siblings. What should I be doing?

Thanks

Milhouse
  • 177
  • 3
  • 11
  • 5
    are you looking at a sum ? or do you wish to group ? without a dummy dataset and and expected output it's hard to tell what you're after (number of brothers/sisters or just if there's at least 1) – Tensibai Jul 22 '16 at 12:56
  • [edit] your question to clarify it, do **NOT** post code in comments – Tensibai Jul 22 '16 at 13:03
  • Sorry, I should have been clearer, I'm just looking for a binary variable that = 1 if the individual has at least one sibling. – Milhouse Jul 22 '16 at 13:05
  • 1
    So got and accept @lmo answer, it is what you're looking for. – Tensibai Jul 22 '16 at 13:07

2 Answers2

3

If your goal is to return a vector of 0s and 1s where it is 1 if the observational unit has a sibling, then you want to include a second duplicated statement with fromLast=TRUE.

The first duplicated function will return a 1 for as many siblings as there are in a MPUBID after the first sibling, and the second duplicated will pick up the first sibling.

hasSiblings <- as.integer(duplicated(Identifiers_age$MPUBID) | 
                          duplicated(Identifiers_age$MPUBID, fromLast=TRUE))

The | is the vector logical operator OR. Note that duplicated returns a logical vector, so you don't have to include the =TRUE after it as you did in your question.

lmo
  • 37,904
  • 9
  • 56
  • 69
0

A dplyr answer:

library(dplyr)

Identifiers_age %>%
  group_by(MPUBID) %>%
  mutate(hasSiblings = as.integer(n() > 1))
Axeman
  • 32,068
  • 8
  • 81
  • 94