Is it possible and efficient to implement MHRW algorithm in SQL?
I want to sample a direct large graph with +1 million nodes and this seems to be one of the best ways to do it. The purpose of the algorithm is for undirect graphs, but I think it can work for directed ones too
The algorithm:
v <- initial node
while stop criteria not met do
select node w uniformly at random from neighbors of v;
generate uniformly at random 0<= p <= 1
if p <= (degree of v) / (degree of w)
then v <- w
else
stay at v
end if
end while
I take the initial node from table1, which contains all nodes and their properties. In table2 I have two columns that display all connections between nodes (and a way to get a nodes degree). The stop criteria would be the size of the sample, ie, while sample <= ~100.000 nodes.
Best regards.