TLDR: I'd like to extract the edge types of every path between two vertices in igraph. Is there a relatively sane way to do this?
The clinic I work for recently undertook a rather large (1400-person) tuberculosis contact investigation in a high school. I have class schedules for all of the students and teachers (!) and have put them into a network (using igraph in R), with each student and each room-period combination as a vertex (e.g., the class in Room 123 in Period 1 is a vertex with a directed edge to the class that's in Room 123 for Period 2). I also know which rooms share ventilation systems - a plausible but unlikely mechanism for infection. The graph is directed out from sole source case, so every path on the network has only two people in it - the source and a contact, separated by a variable number of room-period vertices. Conceptually, there are four kinds of paths:
- personal-contact exposures (source -> contact only)
- shared-class exposures (source -> room-period -> contact)
- next-period exposures (source-> Room 123 Period 1 -> Room 123 Period 2 -> contact)
- ventilation exposures (source -> Room 123 Period 1 -> Room 125 Period 1 -> contact)
Every edge has an attribute indicating whether it's a person-to-person exposure, same-room-different-period, or ventilation edge.
As an intermediate step toward modeling infection on this network, I'd like to just get a simple count of how many exposures of each type a student has had. For example, a student might have shared a class with the source, then later have been in a room the source had been in but a period later, and perhaps the next day been in a ventilation-adjacent room. That student's indicators would then be:
personal.contact: 0
shared.class: 1
next.period: 1
vent: 1
I'm not sure how best to get this kind of info, though - I see functions for getting shortest paths, which makes identifying personal contact links easy, but I think I need to evaluat all paths (which seems like a crazy thing to ask for on a typical social network, but isn't so mad when only the source and the room-periods have out-edges). If I could get to the point where each source-to-contact path were represented by an ordered vector of edge types, I think I could subset them to my criteria easily. I just don't know how to get there. If igraph isn't the right framework for this and I just need to write some big horrible loops over the students' schedules, so be it! But I'd appreciate some guidance before I dive down that hole.
Here's a sample graph of a contact with each of the three indirect paths:
# Strings ain't factors
options(stringsAsFactors = FALSE)
library(igraph)
# Create a sample case
edgelist <- data.frame(out.id = c("source", "source",
"source", "Rm 123 Period 1",
"Rm 125 Period 2", "Rm 125 Period 3",
"Rm 127 Period 4", "Rm 129 Period 4"),
in.id = c("Rm 123 Period 1", "Rm 125 Period 2",
"Rm 127 Period 4", "contact",
"Rm 125 Period 3", "contact",
"Rm 129 Period 4", "contact"),
edge.type = c("Source in class", "Source in class",
"Source in class", "Student in class",
"Class-to-class",
"Student in class", "Vent link",
"Student in class"
)
)
samp.graph <- graph.data.frame(edgelist, directed = TRUE)
# Label the vertices with meaningful names
V(samp.graph)$label <- V(samp.graph)$name
plot(samp.graph, layout = layout.fruchterman.reingold)