4

TLDR: I'd like to extract the edge types of every path between two vertices in igraph. Is there a relatively sane way to do this?


The clinic I work for recently undertook a rather large (1400-person) tuberculosis contact investigation in a high school. I have class schedules for all of the students and teachers (!) and have put them into a network (using igraph in R), with each student and each room-period combination as a vertex (e.g., the class in Room 123 in Period 1 is a vertex with a directed edge to the class that's in Room 123 for Period 2). I also know which rooms share ventilation systems - a plausible but unlikely mechanism for infection. The graph is directed out from sole source case, so every path on the network has only two people in it - the source and a contact, separated by a variable number of room-period vertices. Conceptually, there are four kinds of paths:

  • personal-contact exposures (source -> contact only)
  • shared-class exposures (source -> room-period -> contact)
  • next-period exposures (source-> Room 123 Period 1 -> Room 123 Period 2 -> contact)
  • ventilation exposures (source -> Room 123 Period 1 -> Room 125 Period 1 -> contact)

Every edge has an attribute indicating whether it's a person-to-person exposure, same-room-different-period, or ventilation edge.

As an intermediate step toward modeling infection on this network, I'd like to just get a simple count of how many exposures of each type a student has had. For example, a student might have shared a class with the source, then later have been in a room the source had been in but a period later, and perhaps the next day been in a ventilation-adjacent room. That student's indicators would then be:

personal.contact: 0
shared.class:     1
next.period:      1
vent:             1

I'm not sure how best to get this kind of info, though - I see functions for getting shortest paths, which makes identifying personal contact links easy, but I think I need to evaluat all paths (which seems like a crazy thing to ask for on a typical social network, but isn't so mad when only the source and the room-periods have out-edges). If I could get to the point where each source-to-contact path were represented by an ordered vector of edge types, I think I could subset them to my criteria easily. I just don't know how to get there. If igraph isn't the right framework for this and I just need to write some big horrible loops over the students' schedules, so be it! But I'd appreciate some guidance before I dive down that hole.


Here's a sample graph of a contact with each of the three indirect paths:

# Strings ain't factors
options(stringsAsFactors = FALSE)  
library(igraph)

# Create a sample case
edgelist <- data.frame(out.id = c("source", "source", 
                                  "source", "Rm 123 Period 1", 
                                  "Rm 125 Period 2", "Rm 125 Period 3", 
                                  "Rm 127 Period 4", "Rm 129 Period 4"),
                       in.id = c("Rm 123 Period 1", "Rm 125 Period 2", 
                                 "Rm 127 Period 4", "contact", 
                                 "Rm 125 Period 3", "contact", 
                                 "Rm 129 Period 4", "contact"),
                       edge.type = c("Source in class", "Source in class",
                                     "Source in class", "Student in class",
                                     "Class-to-class", 
                                     "Student in class", "Vent link",
                                     "Student in class"
                                     )
)

samp.graph <- graph.data.frame(edgelist, directed = TRUE)

# Label the vertices with meaningful names
V(samp.graph)$label <- V(samp.graph)$name

plot(samp.graph, layout = layout.fruchterman.reingold)
Matt Parker
  • 26,709
  • 7
  • 54
  • 72
  • Could this be modelled in a simpler way. Say, by only having students as vertices, and then edges connecting students that represent whether students were in the same class, in the next class, in the class after the next class, or in a class sharing a ventilation system. You could then summarize the types of edges incident on each vertex to produce the table you require. You could also assign weights to each type of edge, e.g. with same classroom as 1 (most likely), and ventilation as 10 (least likely). Then find `shortest.paths()` between a source and infection vertex. – digitalmaps May 23 '12 at 11:55
  • @PaulG It certainly could, though part of the point of this question is to avoid going that route. The trick is getting those edge attributes in the first place; I can only think of two ways to do that. One would be a network approach (which is what this question is about!) – Matt Parker May 23 '12 at 15:50
  • A second would be to iterate over each person's classes, asking "Is this class shared with the source? Is this class in a class after the source? Is this class vent-adjacent?" and generating edges as I go along (or I might skip the graph altogether and just summarize the number of classes of each type). That's really not so bad - I had just hoped to be able to use the graph I'd so carefully pieced together! – Matt Parker May 23 '12 at 15:50
  • Would it not be equally computationally expensive to build the simpler graph with students only as vertices, as the one you have shown here? Your network has two distinct types of vertices; it's really the student vertices that are of interest and the "contact locus" vertices are a separate graph that you are using to derive the type of edge between students. – digitalmaps May 23 '12 at 17:57
  • Computational expense isn't really an issue, and I agree that your proposed students-only network makes sense. But... basically, I've already made the person-class graph and it does have the info I need; I just can't seem to find the right way to extract that info. Making the student-student graph means reworking the graph construction code - which I'm doing now, since answers don't seem forthcoming, but which I had hoped to avoid for its *cognitive* expense. – Matt Parker May 23 '12 at 18:34
  • @PaulG Hope I'm actually addressing what you're saying - kind of inundated here at work and I don't feel completely confident that I'm processing your comments correctly! – Matt Parker May 23 '12 at 18:44
  • Others may disagree (they have not chimed in), but I think the students only vertex model is much cleaner and if you can stand that cognitive expense it will bear much more intelligent fruit! – digitalmaps May 23 '12 at 19:38

1 Answers1

1

I'm not entirely sure that I understand your graph model, but if the question is:

I have two vertices and I wish to extract every path between them,
then extract the edge attributes of those edges.

then perhaps this might work.

Go with a breadth-first search. Igraph contains one but it's easy enough to roll your own, and this will give you more flexibility as to what information you want to get. I assume you have no cycles in your graph - otherwise you'll get an infinite number of paths. I don't know much Python (though I do use igraph in R), so here's some pseudocode.

list <- empty

allSimplePaths(u, v, thisPath)
  if (u == v) return
  for (n in neighborhood(u))
    if (n in thisPath)
      next
    if (u == v)
      list <- list + (thisPath + v)
  for (n in neighborhood(u))
    thisPath <- thisPath + n
    allSimplePaths(n, v, thisPath)
    thisPath <- thisPath - thisPath.end

Basically it says "from each vertex, try all possible paths of expansion to get to the end." It's a simple matter to add another thisPathEdges and insert edges, passing it through the function, as well as vertices. Of course this would run better were it not recursive. Be careful, as this algorithm might blow your stack with enough nodes.

You still might want to go with @PaulG 's model, and just have multiple edges between nodes of students. You could do cool things like run a breadth first search to see how the disease spread or find a minimum spanning tree to get a time estimate, or find a min-cut to quarantine an ongoing infection or something.

jclancy
  • 49,598
  • 5
  • 30
  • 34
  • Thanks! I did end using Paul's suggestion, but I'd like to try yours out - just might take me a while to backtrack in version control to the right spot and set this up. Thanks for following up on this old question - I'll report back how it goes. – Matt Parker Jul 31 '12 at 15:15