0

Is it possible and efficient to implement MHRW algorithm in SQL?

I want to sample a direct large graph with +1 million nodes and this seems to be one of the best ways to do it. The purpose of the algorithm is for undirect graphs, but I think it can work for directed ones too

The algorithm:

v <- initial node
while stop criteria not met do
   select node w uniformly at random from neighbors of v;
   generate uniformly at random  0<= p <= 1
   if p  <=  (degree of v) / (degree of w)
       then v <- w
   else 
       stay at v
   end if
end while

I take the initial node from table1, which contains all nodes and their properties. In table2 I have two columns that display all connections between nodes (and a way to get a nodes degree). The stop criteria would be the size of the sample, ie, while sample <= ~100.000 nodes.

Best regards.

npereira
  • 145
  • 1
  • 2
  • 8
  • Please show examples of sample data and what you want the results to look like. SQL is more about data than algorithms. – Gordon Linoff Apr 19 '14 at 12:51
  • There isn´t a good example for this. I want a sample of nodes and their connections that preserves the properties of the original graph and the algorithm provides that effect. – npereira Apr 19 '14 at 14:02
  • Why do you think it's a good idea to execute that algorithm in SQL? Wouldn't it be really slow? – Niklas B. Apr 19 '14 at 16:40
  • 2
    Preserves which properties? – David Eisenstat Apr 19 '14 at 18:01
  • not really answering your question, but have you looked at Neo4J? It's a graph database which would suport this in a more natural way. – Nicolas78 Apr 19 '14 at 23:07
  • Properties like a given centrality, e.g., degree centrality distribution. I haven´t looked at Neo4J, i´ll do it now! – npereira Apr 20 '14 at 20:23

0 Answers0