20

I have been reading quite a bit on graphing libraries for Java and Javascript lately but I haven't found a good way to do what I want to do.

Essentially I have a hierarchy of sets with regards to a bunch of elements (up to several thousands). These sets can be fully or partly overlapping, fully covering or completely disjoint from one another. What I would like to do is to display the following information:

  • The size of a set (in relation to the other sets)
  • A "heat" value (in color code) of a set calculated from the elements it covers
  • The full topology of the sets in a single graph (so that overlaps, intersections etc are displayed to the user)

Edit: Perhaps I should give an example of what I mean by sets and elements and partially overlapping hierarchies. The following is an over-simplified version of the kind of sets I deal with (note that numbers 1-10 and letters a-h and X represent elements which are comparable to one another):

Set1 = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11}
Set2 = {1, 2, 3, 4, 5, 6}
Set3 = {1, 2, 3}
Set4 = {1, 4, 5, 6, 7}
Set5 = {a, b, c, d, e, f, g, h}
Set6 = {a, b, c, d, e}
Set7 = {a, b, c, 7}
Set8 = {2, 4, 7, 8, c, f}
Set9 = {X}

I am not sure how I would go about displaying this information in an intuitive way. I have seen Voronoi ¹,² graphs which I really like visually, however they have a different mathematical background so I don't think I'll be able to portray the hierarchies I have in a proper manner. I would like to create these graphs during runtime (in case of Java) or using Javascript in case of HTML deployment, either is perfectly fine. One thing that is a constraint, however, is that the graphs need to be either created, or can be exportable, to high-res vector graphics.

My questions in short:

  1. Is there a nice way to visualize the kind of data I have? If so does it exist in a readily implemented form (i.e. a library)?
  2. If there is no easy solution to the problem, in other words if I need to invent my wheel in this case, how do I go about implementing such a graph myself? What is a good starting point? What do I pay extra attention to?

Thanks!

Edit: I potential idea I had was to layout all the elements in the universal set as a hexagonal grid with the desired color overlay, and then draw the boundaries for the sets. There are however several problems with that idea, in particular the problem of designating locations for the elements, so that the sets are not split all over the graph. Any comments/suggestions?

ErikE
  • 48,881
  • 23
  • 151
  • 196
posdef
  • 6,498
  • 11
  • 46
  • 94
  • 1
    How many sets are we talking about? for small numbers, [Symmetric Venn diagrams](http://www.google.com/images?q=symmetric%20venn%20diagrams) cover all the possibilities, but not especially paying heed to hierarchy – AakashM Jul 02 '12 at 16:35
  • hundreds for sure, in many cases close to a thousand and sometimes even more... – posdef Jul 02 '12 at 16:51
  • Can you describe what the sets represent and how the visualization will help with analysis? – orangepips Jul 02 '12 at 17:15
  • Maybe you should look at Matlab. – Garrett Hall Jul 02 '12 at 17:15
  • @orangepips sets represent functional networks, and elements are the participants of these networks. Each participant has interesting values associated with it, some of which I would like to reflect to up on the network level, using this graphical representation. – posdef Jul 03 '12 at 06:55
  • @GarrettHall I have used MATLAB before, so I am familiar with it. THe problem with it is that it would introduce a very large and a commercial dependence in my project. Besides, do you have any particular suggestion with regards to solving this problem using MATLAB? – posdef Jul 03 '12 at 06:58
  • How many elements in the total universe are likely? Will they typically exceed the number of sets? – orangepips Jul 03 '12 at 14:43
  • 1
    Chord diagram might be a useful tool: http://mbostock.github.com/d3/ex/chord.html. Order the elements in each set, represent each set as an arc on the circle's edge, set intersections would be represented by chords between arcs, and perhaps chord color serves as a heat map to indicate degree of intersection. In that design there could be more than one chord drawn between a combination of arcs. – orangepips Jul 03 '12 at 17:03
  • You could consider using Euler charts in something like venneuler, see: http://www.cs.uic.edu/~wilkinson/Publications/venneuler.pdf – Josh Jul 04 '12 at 16:42
  • Can you give us a real-world example of the type of data being visualized? I realize you're trying to keep it generic, but not knowing what you're working with seems to limit my ideas for visualization. Also, how would the "heat" value work--what will each color represent? – ErikE Jul 24 '12 at 21:59
  • Hmmm, just understood your question more clearly... Have you considered bubble charts anyway? See my just updated answer below... – Chibueze Opata Jul 24 '12 at 23:00
  • @ChibuezeOpata what do you mean??? you don't have any an answer below... – posdef Jul 25 '12 at 18:03
  • @ErikE There aren't many examples of this kind of data visualization, thus my frustration in finding a decent way to attack the problem at hand. The underlying data is biological experimentation results. – posdef Jul 25 '12 at 18:05
  • @posdef: Sorry, actually deleted it due to some kinda unfriendly comment... I've undeleted it now... – Chibueze Opata Jul 25 '12 at 21:38

4 Answers4

10

Yes, this is a fairly well-studied problem. What you are describing is called a hypergraph. Each element can be represented as a vertex in a graph, and the sets are the hyperedges. The problem then becomes that of visualizing hypergraphs.

enter image description here

Unfortunately there isn't a perfect, generalized solution to this since even the simplest graphs can have complex visualizations.

If your sets are relatively small (< 5 elements), you can use a regular graph drawing library like graphviz. To do this, simply connect all pairs of vertices within each set and color them differently. This will yield a solution similar to this:

enter image description here

tskuzzy
  • 35,812
  • 14
  • 73
  • 140
  • Thanks for helping out with the terminology. :) As you have also said the problem is to visualize these types of graphs in an intuitive way. I have seen Graphviz before, it looks pretty interesting but also a bit complicated in the sense that I am not really sure what is possible and what is not possible using that tool. Ditto for most other types of visualization libraries. I was hoping for a layout that is a bit new and out-of-the-box (See: chord diagrams using D3 mentioned as a comment to the OP, unfortunately that doesn't help either but I like the originality) – posdef Jul 19 '12 at 10:47
  • 1
    I am not getting how the second diagram shows set membership. – ErikE Jul 25 '12 at 08:30
  • @ErikE I have the same question. My guess is: 3 sets. red, blue, black ?! – user77115 Nov 23 '12 at 11:44
  • @user77115 But why are there three red lines connecting the same two points? What does that mean? – ErikE Nov 23 '12 at 18:24
  • @ErikE a detailed example, based on the actual sets given in the question, would help interpreting the meaning of the diagram. I think it's a new SO question. – user77115 Nov 26 '12 at 06:21
5

Have you considered a 2-dimensional grid:

  • Put the set number on one axis
  • Put the unique elements found in all sets on the other axis
  • Color each cell where an element is found in a set (by looking at that row and column's labels)

While this visualization method would normally be inferior to some of the more complicated ones mentioned so far, it has the virtue of actually being possible when you have thousands of elements and thousands of sets.

The trick will be to order the rows and columns in a way that puts the most information together in a way useful to the user. My instinct says that the problem you're trying to solve is to make the colored cells be as "bloblike" as possible—if each set of adjacent colored cells is called an "area", to have the least number of distinct areas and for them to have the fewest holes in them.

That is a very complicated problem in its own right, but could be at least partially solved by working up some adjacency factors for each set against every other set. What you're looking for are "islands" of closeness--so start with the pair of most alike sets, add them to the graph, and consider them a region. Recalculate your closeness numbers with the region replacing the pair it holds (averaging in some way?). Find the next most close pair of items (each item being a region or a set), and if that pair is within a certain threshold of closeness to any existing region in the graph, attach to one side of that region, otherwise create a new, separate region (again removing the pair's closeness values and recomputing for the region itself). Eventually, all sets will be added to regions, and all regions will be joined. Joining two regions can have four possibilities (flipping may be required), so which sides to attach in the graph could be calculated by the closeness of the sets on the 4 edges of the two regions.

While this may never give the optimal configuration, it should come up with something that has few regions compared to a random distribution.

Finally, some dynamic reordering might be useful, by allowing the user to select an interesting set or element, and use that as the seed for a completely rearranged graph, calculating each addition based on closeness to that element (and subsequently that region after being combined with another element), rather than overall lowest closeness of any.

Here is a diagram of the result, having done the above logic process on the example set of data in your question:

Sets and Elements

Deciding how to order the columns is complex, but basically you can get sort of reasonable results by moving columns to be adjacent when such a move won't disturb the colored block area of any already-added segments.

Additional thoughts:

  • Calculating set closeness is not just how many elements they have in common, but also how many elements they have that are not in common. If two pairs of sets have 3 elements in common between the pairs, but one has 5 non-shared elements and the other has 3 non-shared elements, then the pair with 3 non-shared elements is a closer match than the other.
  • After adding a set to the graph, there is an opportunity to reorder the elements. Stacking the elements as leftmost as possible is a good start for the first placement. After that, stacking most common elements leftmost seems good. After that, it breaks down. I wonder if getting the colored cells as close to the diagonal (from top left to bottom right) would also be a useful algorithm--this reminds me a little of the Design Structure Matrix though that only shows one-way dependencies rather than two-way relationships.
  • When a colored blob consists of sets that are completely disjoint from all other sets (like the set containing X in your example), it can be moved to a separate graph.
ErikE
  • 48,881
  • 23
  • 151
  • 196
  • NIce answer Erik, thanks for taking your time! I actually went along with something similar to this before I took off for a week of vacation :) In my case I opted for a hexagonal grid, increasing the number of immediate neighbors, which may or may not be a smart choice given the underlying data. I wrote a layout algorithm in JS to build the grid in a spiral manner, and bind the data from an array to the nodes in the spiral. This reduces the problem to ordering the data in the array, which could be pretty tricky in it's own right. – posdef Jul 25 '12 at 17:58
  • Oh and I think the idea of interactivity is spot on here, I figured I had to get the user involved somehow and thus having been looking at Javascript powered frameworks, such as D3.js. – posdef Jul 25 '12 at 17:59
2

There are many approaches to this problem but personally, I'd draw sort of a Venn chart using dynamically generated SVG with a tool like Raphael JS and color it the way I want. Also, Raphael has api like Set that can enable you to give full detailed information about the elements and their relations. There SVG to Code converter will also likely help out in understanding how you can generate the SVG elements.

Alternatively you could, use tools like Venn charts:

Venn chart sample

which seems to be easily adaptable to this scenario. There's also Flotr2 which can create bubble charts:

Bubble chart flotr

or even Canvas Express.

Canvas Xpress Diagrams

A little more tweaking with any of the later tools will enable you to get it properly done...

Chibueze Opata
  • 9,856
  • 7
  • 42
  • 65
0

I do not have your solution for getting the data in the proper format. Take a look at this javascript plugin created by MIT for building graphs, sigmajs. Haven't looked at the data it accepts, but may be worth a look.

keaplogik
  • 2,369
  • 2
  • 25
  • 26