4

I have some graph databases (friends networks, purchasing history, etc.) that I persist with Neo4j. I plan to analyze these with community detection algorithms such as Girvan Newman. These algorithms usually return a dendrogram, representing the division of the graph from whole network to individual nodes. I am wondering how I might persist these results. I suppose it could be stored as a separate graph, but is there a way to store it within the graph itself? My concern in doing so is the need for creating nodes to represent the groups, which is something I would like to avoid.

Argalatyr
  • 4,639
  • 3
  • 36
  • 62
Paul Jackson
  • 2,077
  • 2
  • 19
  • 29
  • Do you want a Neo4j-specific solution, or are you looking for a more general strategy? – Michael J. Barber Dec 05 '11 at 14:49
  • I would like to persist it in Neo4J (although the answer ought to apply to any property graph store). I would like to avoid using an alternative persistence mechanism, such as a SQL store or B-tree. – Paul Jackson Dec 05 '11 at 15:03

2 Answers2

4

Most community detection algorithms work by agglomerating communities along existing edges in the graph; Girvan-Newman is a little unusual in that it works by cutting edges. Either way, the dendrogram can be viewed as showing an ordering of operations on the edges of the graph. Thus, instead of storing the dendrogram as a separate object, you can attach properties to the edges (relationships) showing in which order they should be merged/cut. My knowledge of Neo4j is extremely limited, so I'll leave the details to you.

There are some complications with merging, as there will generally be multiple equivalent edges, each linking different vertices within the communities to merge. Basically, just pick a strategy that lets you figure out the linked communities from the edges.

Argalatyr
  • 4,639
  • 3
  • 36
  • 62
Michael J. Barber
  • 24,518
  • 9
  • 68
  • 88
  • This makes sense. I don't have an easy way to iterate through edges ordered by property value, but I should be able to solve that in real time a lot more efficiently than recalculating the communities. Thanks. – Paul Jackson Dec 05 '11 at 22:26
4

One way to represent a dendrogram is as a list of pairs, containing (n-1) pairs for n elements. Assuming the left element of the pair is the one whose ID is kept to refer to all elements in a community, a sample dendrogram might look like

[[0,1],[2,3],[0,2]]

So an alternative way to persist that might be to store at each node at which time step it is merged into another node (together with all the nodes that have been previously merged into it).

So you'd attach (0:0) to 1, (1:2) to 3 and (2:0) to 2 (timestep:new 'name' of node).

edit: Concretely, this might mean attaching two integer-valued attributes e.g. 'merge_timestep' and 'merge_into' to each Neo4J node object.

Nicolas78
  • 5,124
  • 1
  • 23
  • 41
  • I think prefer this solution to edge properties for a couple reasons; I can split this information into two properties and query for the nth node and also for all nodes attached under n; I can't query edge properties; All nodes (except the last) have a property; And lastly, it provides a name for the virtual node representing a community. Thanks. – Paul Jackson Dec 06 '11 at 17:58
  • @PaulJackson This solution is pretty much a standard (and good) approach to storing a dendrogram. It also goes explicitly against what you ask in your question. Perhaps some rewriting of your question is order, if you actually don't care about storing the community information in the graph itself. – Michael J. Barber Dec 07 '11 at 06:49
  • @MichaelJ.Barber (first of all, it's an honor, I'm actually citing your article on bipartite community detection in my thesis work :). What I'm proposing is to store one attribute per node designating it's role in the dendrogram, attached to the node itself in the Neo4J graph representation, not storing the dendrogram somewhere outside the graph. – Nicolas78 Dec 07 '11 at 11:03
  • Hm, I see what you mean. I read it as to store the list of pairs, with a description of how to relate the (mathematical) nodes from the graph; with your comment I see how mean the Neo4j nodes. This probably says more about whether I should make comments on websites early in the morning than it does about the clarity of your answer. – Michael J. Barber Dec 07 '11 at 11:44
  • Ah. Well, true, I'm not talking about Neo4J nodes anywhere, so this might be clarified after all (see edit) – Nicolas78 Dec 07 '11 at 13:46
  • a denogram isn't really a graph though...it starts with root nodes which is not really a full graph I don't think? – PositiveGuy Mar 24 '13 at 21:08
  • @CoffeeAddict, yes not every graph is a dendogram. however, the point here is about storing clustering information, which is a dendogram (first join these two nodes, then join these two nodes/clusters of nodes, then these, and so on) – Nicolas78 Mar 25 '13 at 16:28