(Set of) List of sets (Cartesian product(s)) from graph corresponding to set of lists

Question

The set of lists (A):

{[a,b,d,f],
 [a,c,d,f],
 [a,b,e,f],
 [a,c,e,f]}

where a, b, c, d, e and f are items (not necessarily characters in a word), can be factored as a directed acyclic graph (DAG, B, all edges point from left -> to right):

  b-->d
 / \ / \
a   X   f
 \ / \ /
  c-->e

or as the Cartesian product of 4 sets of items (C, termed axes):

{a} * {b,c} * {d, e} * {f}

Guava has a nice method for generating a set of lists (A) from a list of sets (C).

I am trying for an algorithm that accepts a graph like B and returns a list of axes like C (actually one or more, see example below), which can be used with the method above to generate a set of lists like A.

However, it is not guaranteed that the set of lists will be a Cartesian product. For example:

{[a,b,d,f],
 -missing-
 [a,b,e,f],
 [a,c,e,f]}

corresponding to the DAG:

  b-->d
 / \   \
a   \   f
 \   \ /
  c-->e

cannot be expressed as 1 Cartesian product but can be expressed as 2:

{a}*{b}*{d,e}*{f}    and    {a}*{c}*{e}*{f}

corresponding to the graphs:

      d
     / \
a-->b   f            and     a-->c-->e-->f 
     \ /
      e

The lists should have some degree of relatedness (think: a random sample of a very large Cartesian product).

Note: lists of different lengths cannot share the same set of axes.

Is there an algorithm that does this and I just haven't Googled the right terms? If not, can we create it?

Complexity of the algorithm may be an issue as the set could have 10^2 lists and each list could have 10^2 of items, i.e. a fairly large graph. I can guarantee that the input graphs would have the minimal number of nodes possible to represent the set of lists..., and that connected non-branching nodes (a->c->e->f) can be rolled up into single objects (acef).

PS. I don't think this is same as the Cartesian product of graphs, but there could be some overlap.

There are graph libraries (I am thinking about jgrapht here, but there are probably others), have you tried and poked them to see if they had something approaching? — fge, Jun 17 '13 at 16:44
@fge I am familiar with JGraphT and it does not have such an algorithm. — Jon, Jun 17 '13 at 17:17
I would consider the lists as words and try to locate longest common suffixes. — G. Bach, Jun 17 '13 at 18:23
Is there some reason you don't want to generate the set of lists directly? — David Eisenstat, Jun 17 '13 at 18:30
Would maybe a BFS from a source of the DAG (I assume it's a DAG?) work, checking for branches every time a new boundary is reached? — G. Bach, Jun 17 '13 at 19:44

Zim-Zam O'Pootertoot · Answer 1 · 2013-06-18T03:32:44.787

1

If I understand your question correctly, you're after (A) and only want (C) as an intermediate step. Generate the shortest paths through the graph using e.g. Dijkstra's algorithm - this will generate the set of lists (A). If you still need the Cartesian product at this point (i.e. if you weren't just generating the Cartesian product as an intermediate step to generating (A)) then it's much easier to generate it from (A) than from (B).

edited Jun 18 '13 at 03:32

answered Jun 18 '13 at 03:14

Zim-Zam O'Pootertoot

17,888
4
41
69

There could be more than Integer.MAX_VALUE paths through the graph so it is neither feasible or possible to generate A completely. I want the axes to be able to make a balanced sample of the paths through the graph (...could probably random walk it in that case...but it would be interesting to attempt an algorithm). – Jon Jun 18 '13 at 08:40
@Jon If you want uniform random samples, there's a linear-time algorithm to count the number of paths from each vertex and another linear-time algorithm to use these counts to pull a sample. It's not a complicated algorithm but I don't have time to write it up properly right now. – David Eisenstat Jun 19 '13 at 00:13
A path counting algorithm is described [here](http://stackoverflow.com/questions/5164719/number-of-paths-between-two-nodes-in-a-dag/5164820#5164820) – Zim-Zam O'Pootertoot Jun 19 '13 at 00:16
Thanks @DavidEisenstat and #Zim-ZamO'Pootertoot. I assimilated those ideas. However, I realised that to get B, I would need to know A, which means determining C (as #sgpc noted) is basically redundant; since C is the space-efficient means of expressing A or B. I realised that I really need to start with C, in fact many possibly-intersecting Cartesian sets, and refine in such a way that the remainder do not intersect. I think I have solved this, but it doesn't answer this question. If someone would asks such a question (I'd love to see another problem that needs it!) I would be happy to answer. – Jon Jul 05 '13 at 00:29

(Set of) List of sets (Cartesian product(s)) from graph corresponding to set of lists

1 Answers1