3

I'm trying to filter a data.frame with family information. It looks like this:

 +--------+-------+---------+
 |  name  |  dad  |   mom   |
 +--------+-------+---------+
 | john   | bert  | ernie   |
 | quincy | adam  | eve     |
 | anna   | david | goliath |
 | daniel | bert  | ernie   |
 | sandra | adam  | linda   |
 +--------+-------+---------+

Now I want to know if every person who has the same dad, also has the same mom. I've been over this for an hour now trying different approaches, but i keep getting stuck. Also, i'd like to use an R-approach and not a long sequence of functions or for-loops that technically does what i want, without learning anything new.

My expected output:

 +--------+------+-------+
 |  name  | dad  |  mom  |
 +--------+------+-------+
 | quincy | adam | eve   |
 | sandra | adam | linda |
 +--------+------+-------+

Essentially I want to have a data.frame with dads and moms who have kids from multiple partners.

So far my approach has been:

  1. split the df by the father column
  2. from the resulting list of dfs, remove all dfs with only one row (here i already get stuck, cant make it work)
  3. remove all dfs where nrow(unique(df$mom)) = 1
  4. the resulting list should give me all siblings with different parents.

My code up to now:

 fraternals <- split(kinship, kinship$father)
 fraternals <- fraternals[-which(lapply(fraternals, function(x) if(nrow(x) == 1) { output TRUE }))]

but that doesn't run because r says i can not use TRUE in that way.

JadaLovelace
  • 194
  • 4
  • 16

2 Answers2

3

One dplyr possibility could be:

df %>%
 group_by(dad) %>%
 filter(n_distinct(mom) != 1)

  name   dad   mom  
  <chr>  <chr> <chr>
1 quincy adam  eve  
2 sandra adam  linda

If you don't want to filter but want to see this information:

df %>%
 group_by(dad) %>%
 mutate(cond = n_distinct(mom) != 1)

  name   dad   mom     cond 
  <chr>  <chr> <chr>   <lgl>
1 john   bert  ernie   FALSE
2 quincy adam  eve     TRUE 
3 anna   david goliath FALSE
4 daniel bert  ernie   FALSE
5 sandra adam  linda   TRUE 
tmfmnk
  • 38,881
  • 4
  • 47
  • 67
2

Here is an option using data.table

library(data.table)
setDT(df)[, .SD[uniqueN(mom) != 1], .(dad)]
akrun
  • 874,273
  • 37
  • 540
  • 662