filtering rows in data.frame where column values are inconsistent

Question

I'm trying to filter a data.frame with family information. It looks like this:

 +--------+-------+---------+
 |  name  |  dad  |   mom   |
 +--------+-------+---------+
 | john   | bert  | ernie   |
 | quincy | adam  | eve     |
 | anna   | david | goliath |
 | daniel | bert  | ernie   |
 | sandra | adam  | linda   |
 +--------+-------+---------+

Now I want to know if every person who has the same dad, also has the same mom. I've been over this for an hour now trying different approaches, but i keep getting stuck. Also, i'd like to use an R-approach and not a long sequence of functions or for-loops that technically does what i want, without learning anything new.

My expected output:

 +--------+------+-------+
 |  name  | dad  |  mom  |
 +--------+------+-------+
 | quincy | adam | eve   |
 | sandra | adam | linda |
 +--------+------+-------+

Essentially I want to have a data.frame with dads and moms who have kids from multiple partners.

So far my approach has been:

split the df by the father column
from the resulting list of dfs, remove all dfs with only one row (here i already get stuck, cant make it work)
remove all dfs where nrow(unique(df$mom)) = 1
the resulting list should give me all siblings with different parents.

My code up to now:

 fraternals <- split(kinship, kinship$father)
 fraternals <- fraternals[-which(lapply(fraternals, function(x) if(nrow(x) == 1) { output TRUE }))]

but that doesn't run because r says i can not use TRUE in that way.

Possible duplicate of https://stackoverflow.com/questions/31649049/select-groups-with-more-than-one-distinct-value — Ronak Shah, Oct 09 '19 at 09:53

score 3 · Accepted Answer · answered Oct 09 '19 at 09:45

One dplyr possibility could be:

df %>%
 group_by(dad) %>%
 filter(n_distinct(mom) != 1)

  name   dad   mom  
  <chr>  <chr> <chr>
1 quincy adam  eve  
2 sandra adam  linda

If you don't want to filter but want to see this information:

df %>%
 group_by(dad) %>%
 mutate(cond = n_distinct(mom) != 1)

  name   dad   mom     cond 
  <chr>  <chr> <chr>   <lgl>
1 john   bert  ernie   FALSE
2 quincy adam  eve     TRUE 
3 anna   david goliath FALSE
4 daniel bert  ernie   FALSE
5 sandra adam  linda   TRUE

score 2 · Answer 2 · answered Oct 09 '19 at 16:44

2

Here is an option using data.table

library(data.table)
setDT(df)[, .SD[uniqueN(mom) != 1], .(dad)]

answered Oct 09 '19 at 16:44

akrun

874,273
37
540
662

filtering rows in data.frame where column values are inconsistent

2 Answers2