0

I'm using Boost Graph and want to find a way to identify sequences (subgraphs) within a graph which follow a specific pattern.

I think of my subgraph as a pattern or template. The actual graph contains nodes with parts of strings and parsed data; for example, a parsed date string in a graph may be: ...->(9)->("/")->(4)->"/"->(2017)->.... My template subgraph would find all instances of such a date in the graph, so it would match nodes like this: (1<=d<=31)->(.|/)->(1<=m<=12)->(same symbol as in 2nd node)->(1900<=y<=2100).

The function vf2_subgraph_iso takes predicates as arguments to determine equality of edges and vertices, but I am unsure if there is a way to "hack" those predicates to actually find nodes according to a simple pattern which is beyond mere equality.

As the equality predicates aren't given any kind of state or context, I'm having a hard time figuring out how to maintain such state internally. Is this possible? Or is there a better suited algorithm out there?

Felix Dombek
  • 13,664
  • 17
  • 79
  • 131
  • What do `(n)` and `(m)` denote? Because if you mean just vertices holding the numbers `n` and `m` then you're just looking for an edge. – sehe Apr 08 '17 at 22:04
  • Why not just enumerate edges? – sehe Apr 08 '17 at 22:15
  • So the test would be relative to the nodes inside one graph. What is the source of the numeric requirements going to be? You want them to copy from the "pattern graph" somehow? How (note I'm making you spec your requirements in a Socratic way) – sehe Apr 08 '17 at 23:32
  • I wanted to keep it simple in the question because I thought that the simpler problem is about the same, using a subgraph find algorithm to find nodes by having a subgraph template and filling it with matching values from the graph, where each node in the subgraph template can compare itself to a node from the graph and see if it matches. I now think this should be possible with `vf2_subgraph_iso` even if it always calls the same comparison function, by adding a map from subgraph node to an actual comparison function for that node. – Felix Dombek Apr 09 '17 at 00:18
  • 1
    Mmm. First off: that sounds like a really bad data structure choice for the source data (how did you get it in this weird form? Are you writing code to exfiltrate data from shredded documents?). Second, it seems that the algo choice would be a rather ineffective choice of brute-force angle. Lemme think about this a little more. (Also, am I right in thinking that the original test from the question stopped being interesting at all? You might wanna update the question) – sehe Apr 09 '17 at 01:36
  • @sehe I'm exploring other possible data structures, this one evolved from an earlier representation of parse trees of more or less unstructured data. `vf2_subgraph_iso` has the additional problem that it doesn't work for `adjacency_list` (even though that one is faster for enumerating edges(?)). Thanks for your input so far! (Pattern matching in graphs still seems like a very interesting topic with lots of recent scientific publications. Maybe worth a separate question.) – Felix Dombek Apr 11 '17 at 18:02
  • For what it's worth, I'm thinking of combining a BFS with an FSM that matches the pattern you're looking for (this is possible because your pattern is a simple sequence of nodes linked by edges). – Rerito Apr 12 '17 at 12:12
  • @Rerito Breadth-first? Why that? I was thinking depth-first. This is almost exactly what I wrote now - a DFS matching states in a linear FSM. – Felix Dombek Apr 12 '17 at 14:25
  • @FelixDombek It turns out with the proper DFS/BFS visitor, you can apply the FSM optimally. It seemed easier to do using the BFS in my opinion though: I was thinking of a visitor with a stack storing the state of the FSM at a given node. When that node is "fully explored" through BFS, the stack is popped. The head of the stack and the node _T_ being investigated allow us to determine the state for that node _T_ which can then be stacked and so on. – Rerito Apr 12 '17 at 15:35

0 Answers0