2

I have a rather large tree like structure / dendrogram like / web (think pedigree) and I want to create a list of singularly connected leafs / nodes.

In genealogy something similar I believe is called a "Spitzenahnen" (German) / but it I believe is specific to 'no known parents', not necessarily no descendants. So basically dead ends in the structure, not just top or bottom is what I am looking to find.

I saw the post on creating a edge list from a Matrix as well as how to access the attributes of a dendrogram in R but not sure how to apply it to get the specific results I am looking to obtain.

I have thousands of nodes with multiple starting and end points. I want to create a list of nodes/leafs where there is only one connected node that attaches it to the tree. So if there are two or more connections to the node (some have up to two dozen at most), I do not want to see it in my list.

Using a marked up graphic from "Drawing pedigree diagrams with R and graphviz" by Jing Hua Zhao I only want to see the highlighted nodes, but some of the applicable nodes may be buried deep within the web and not necessarily on the 'edge'.

enter image description here

Community
  • 1
  • 1
CRSouser
  • 658
  • 9
  • 25
  • 1
    can you provide any data for us to work with? a `dput` of the set and any code you have tried so far would be useful – mlegge Mar 26 '15 at 17:51

1 Answers1

2

It looks like you're using this data:

pre <- read.table(text="pid id father mother sex affected
10081 1 2 3 2 1
10081 2 0 0 1 2
10081 3 0 0 2 1
10081 4 2 3 2 1
10081 5 2 3 2 2
10081 6 2 3 1 2
10081 7 2 3 2 2
10081 8 0 0 1 2
10081 9 8 4 1 2
10081 10 0 0 2 2
10081 11 2 10 2 2
10081 12 2 10 2 1
10081 13 0 0 1 2
10081 14 13 11 1 2
10081 15 0 0 1 2
10081 16 15 12 2 2",header=T)

If you're looking at graph-like data, you might consider using the igraph library. Here's one way to create a similar plot.

unit<-as.character(interaction(pre$father, pre$mother))
el<-rbind(
    data.frame(person=as.character(c(pre$father, pre$mother)), unit=unit, stringsAsFactors=F),
    data.frame(person=unit, unit=pre$id, stringsAsFactors=T)
)
el<-subset(el, person!="0" & person !="0.0" & unit!="0" & unit!="0.0")
gg<-simplify(graph.data.frame(el, vertices=rbind(
    data.frame(id=pre$id, type="person", affected=pre$affected==1, sex=pre$sex),
    data.frame(id=unique(unit), type="family", affected=FALSE, sex=0))))

V(gg)$color <- "grey"
V(gg)[type=="person" & !affected]$color <- "deepskyblue"
V(gg)$label <- ""
V(gg)[type=="person"]$label <- V(gg)$name
V(gg)$size <-2
V(gg)[type=="person"]$size <- 15
V(gg)$shape<-"circle"
V(gg)[sex==1]$sex<-"square"

Which produces

enter image description here

(or something similar, the default layout algorithm is stochastic).

It's a bit messy to reshape the data, but the idea is that I create pseudo-nodes for each union resulting in a child. Then I connect parents as incoming nodes and children as outgoing nodes.

Basically the nodes you describe all have one connection so that means in the graph setting, they all have degree 1. We can change these labels to red to get

V(gg)$label.color<-"black"
V(gg)[degree(gg)==1 & type=="person"]$label.color<-"red"

plot(gg)

enter image description here

or you can just get the names with

V(gg)[degree(gg)==1 & type=="person"]$name
# [1] "1"  "3"  "5"  "6"  "7"  "8"  "9"  "10" "13" "14" "15" "16"
MrFlick
  • 195,160
  • 17
  • 277
  • 295