I am working on a networking problem related to family/household composition. I have multiple edge tables containing id1, id2 and a relationship code to state the type of relationship between the identity variables. These tables are large, upwards of 7 million rows in each. I also have a node table which contains the same id and various attributes.
What I want to achieve is an adjacency matrix which will give summary statistics similar to something like this:
Children
1 2 3 4 total
--------------------
1 | 1 0 1 0 2
|
Adults 2 | 3 5 4 1 13
|
3 | 1 2 0 0 3
|
total | 5 7 5 1 18
Essentially I want to be able to identify and count distinct networks in my data.
My data is in the form:
ID1 ID2 Relationship_Code
X1 X2 Married
X1 X3 Parent/Child
X1 X4 Parent/Child
X5 X6 Married
X5 X7 Parent/Child
X6 X5 Married
. . .
. . .
. . .
I also have a node table which contains date of birth and other variables from which adult/child status can be identified.
Any tips/hints on how to extract this summary information from the graph data frame would be very helpful and much appreciated.
Thanks