0

I am working on a networking problem related to family/household composition. I have multiple edge tables containing id1, id2 and a relationship code to state the type of relationship between the identity variables. These tables are large, upwards of 7 million rows in each. I also have a node table which contains the same id and various attributes.

What I want to achieve is an adjacency matrix which will give summary statistics similar to something like this:

                      Children

             1  2  3  4   total 
            --------------------
          1 | 1  0  1  0    2
            |
 Adults   2 | 3  5  4  1    13  
            |
          3 | 1  2  0  0    3
            |
      total | 5  7  5  1    18 

Essentially I want to be able to identify and count distinct networks in my data.

My data is in the form:

             ID1  ID2   Relationship_Code

              X1   X2    Married 
              X1   X3    Parent/Child
              X1   X4    Parent/Child 
              X5   X6    Married
              X5   X7    Parent/Child 
              X6   X5    Married
               .    .     .
               .    .     .
               .    .     . 

I also have a node table which contains date of birth and other variables from which adult/child status can be identified.

Any tips/hints on how to extract this summary information from the graph data frame would be very helpful and much appreciated.

Thanks

Roland
  • 127,288
  • 10
  • 191
  • 288
williamg15
  • 77
  • 7
  • Show how your input data looks like (small example), show what your end result should look like. – Andre Elrico Oct 16 '18 at 11:44
  • I cant publish the data itself but it is in the form: – williamg15 Oct 16 '18 at 11:54
  • If you can't publish the data, INVENT data that has similar form. – Andre Elrico Oct 16 '18 at 11:56
  • Apologies, I have edited the question with an example of the form. – williamg15 Oct 16 '18 at 12:02
  • Is your sample data correct? I would think that if you had the first three relations, you would also have two additional relations that say that X2 has a Parent/Child relation with X3 and X4. – G5W Oct 16 '18 at 12:48
  • Yes you are correct, I just wanted to show the form the data is in. There would be additional relationships within the table where X2-X3 and X2-X4 are parent/child relationships. Also, we would have another 'duplicate' relationship between X2-X1 (married) – williamg15 Oct 16 '18 at 13:04
  • Do you have single people in households? How are they represented? – G5W Oct 16 '18 at 13:09
  • In the case of the edge table there are no single person households. They could be obtained by joining the edge tables with the node table. – williamg15 Oct 16 '18 at 13:19
  • Your example output includes households with 3 adults. What do the re4lationships look like there? – G5W Oct 16 '18 at 13:32
  • @G5W Using the node table (date of birth) variable I would be able to determine the age of the person, thus I would be able to deduce whether a certain person is an adult of child. – williamg15 Oct 16 '18 at 13:54

1 Answers1

2

Some of the work that is required to get the final table that you want requires access to the node table which you are not showing us, but I can get you pretty far along in your problem.

I think that the key to getting your result is identifying the households. You can do this in igraph using components. The connected components are households. I will illustrate with a slightly more elaborate version of your example.

Data:

Census = read.table(text="ID1  ID2   Relationship_Code
              X1   X2    Married 
              X2   X1    Married 
              X1   X3    Parent/Child
              X1   X4    Parent/Child 
              X2   X3    Parent/Child
              X2   X4    Parent/Child 
              X5   X6    Married
              X5   X7    Parent/Child 
              X6   X7    Parent/Child 
              X6   X5    Married
              X8   X9    Married
              X9   X8    Married",
    header=T)

Now turn it into a graph, find the components and check by plotting.

library(igraph)
EL = as.matrix(Census[,1:2])
Pop = graph_from_edgelist(EL)
Households = components(Pop)
plot(Pop, vertex.color=rainbow(3, alpha=0.5)[Households$membership])

Household network

You said that you could label the nodes as to whether they represent adults or children. I will assume that we have such a labeling. From that, it is easy to count the number of adults by household and children by household and to make a table of household decomposition by adults and children.

V(Pop)$AdultChild = c('A', 'A', 'C', 'C', 'A', 'A', 'C', 'A', 'A')
AdultsByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership), 
    function(p) sum(p=='A'))
AdultsByHousehold
  Group.1 x
1       1 2
2       2 2
3       3 2

ChildrenByHousehold = aggregate(V(Pop)$AdultChild, list(Households$membership), 
    function(p) sum(p=='C'))
ChildrenByHousehold
  Group.1 x
1       1 2
2       2 1
3       3 0

table(AdultsByHousehold$x, ChildrenByHousehold$x)
    0 1 2
  2 1 1 1

In my bogus example, all households have two adults.

G5W
  • 36,531
  • 10
  • 47
  • 80