Q1. We're trying to perform random walk and I followed the example,
https://github.com/neo4j/graph-data-science-client/blob/main/examples/import-sample-export-gnn.ipynb
our graph consists of 170 million nodes, 1700 million edges and set enough memory for both heap(256GB) and page(10GB)
we set sampling ratio to sample 10000 nodes, random walk takes short time, but retrieving graph to python dataframe via stream takes forever,
is there something that I'm missing here? e.g. indexing, etc
my understanding is, gds basically
- project graph from disk to memory by gds.project
- perform graph operation and hold the graph path by gds.rwr
- fetch the actual node/edge properties by gds.stream
I don't think there's indexing necessary in this circumstance, I'm not an expertise in DB though.
Q2.
We're trying to stream node label from catalog graph, but I can't find any function for that.
How should I fetch node label(type)?
for Q1, after long-wait, We retrieved the graph with desired node count,
I think stream took long time because, it has 40MIL edges is there some way to limit the edges as well?
I see that there used to be walkLength, walksPerNode
is there equivalent for gds.alpha.graph.sample.rwr
?