Computing Page Rank on GPU using Pull-based Traversal

Question

I have a CUDA kernel that computes the Page Ranks using a Push-based traversal. I am trying to convert it to one that follows a Pull-based traversal strategy but I am always getting wrong results. What am I doing wrong? (Assume for simplicity that this is an undirected graph)

This is the CUDA kernel that does the Push-based Page rank:

__global__ void pr_kernel(unsigned int numNodes,
                                                        unsigned int from,
                                                        unsigned long numPartitionedEdges,
                                                        unsigned int *activeNodes,
                                                        unsigned long *activeNodesPointer,
                                                        OutEdge *edgeList,
                                                        unsigned int *outDegree,
                                                        float *dist,
                                                        float *delta,
                                                        float acc)
{
        unsigned int tId = blockDim.x * blockIdx.x + threadIdx.x;

        if(tId < numNodes)
        {
                unsigned int id = activeNodes[from + tId];
                unsigned int degree = outDegree[id];
                float thisDelta = delta[id];

                if(thisDelta > acc)
                {
                        dist[id] += thisDelta;

                        if(degree != 0)
                        {

                                float sourcePR = ((float) thisDelta / degree) * 0.85;

                                unsigned long thisfrom = activeNodesPointer[from+tId]-numPartitionedEdges;
                                unsigned long thisto = thisfrom + degree;

                                for(unsigned long i=thisfrom; i<thisto; i++)
                                {
                                        atomicAdd(&delta[edgeList[i].end], sourcePR);
                                }
                        }

                        atomicAdd(&delta[id], -thisDelta);
                }
        }
}

And the following is my version of Pull-based Page-rank (which is giving wrong results):

__global__ void pr_pull(unsigned int numNodes,
                                                        unsigned int from,
                                                        unsigned long numPartitionedEdges,
                                                        unsigned int *activeNodes,
                                                        unsigned long *activeNodesPointer,
                                                        OutEdge *edgeList,
                                                        unsigned int *outDegree,
                                                        float *dist,
                                                        float *delta,
                                                        float acc)
{
        unsigned int tId = blockDim.x * blockIdx.x + threadIdx.x;

        if(tId < numNodes)
        {
                unsigned int id = activeNodes[from + tId];
                unsigned int degree = outDegree[id];
                float thisDelta = delta[id];
                if(thisDelta > acc)
                {
                        float receivedPR = thisDelta * 0.85;

                        unsigned long thisfrom = activeNodesPointer[from+tId]-numPartitionedEdges;
                        unsigned long thisto = thisfrom + degree;

                        for(unsigned long i=thisfrom; i<thisto; i++)
                        {
                                delta[id] += ((float)receivedPR / outDegree[edgeList[i].end]);
                        }
                        dist[id] += thisDelta;
                        delta[id] -= thisDelta;
                }
        }
}

Implementation of pr_kernel is taken from Subway

Could you explain what you mean by push/pull? Is it just scatter vs gather, where scatter can have collisions and therefore needs atomic updates? Other than getting rid of the atomics, the only substantial difference between the kernels seems to be that you are normalizing using a different degree (`outDegree[id]` vs `outDegree[edgeList[i].end`). Why would these changes not cause different/wrong results? How are you making sure that there is no race conditions on `delta`? — paleonix, Aug 10 '23 at 14:18
I found [Scalable Data-driven PageRank: Algorithms, System Issues, and Lessons Learned (Joyce Jiyoung Whang et al)](https://www.cs.utexas.edu/~inderjit/public_papers/scalable_pagerank_europar15.pdf) helpful to understand what you mean by pull and push. Your `delta` seems to be the residual r and your `dist` seems to be the pagerank x. I'm pretty sure that `delta[edgeList[i].end] += ((float)receivedPR / outDegree[edgeList[i].end]);` is a "push". So do you really want a pull-based algorithm or a push-pull one as described in the paper? — paleonix, Aug 10 '23 at 15:01
Yes, I am sorry I forgot to indicate the mappings between delta and dist to residual and page rank respectively. And the delta[edgeList[i].end] was a typo -- should have been delta[id] instead. But still doesn't seem to solve it :/ I would want a pull-based implementation, i.e., each node updates its own page-rank values using contributions from its neighbours rather than the push-based approach where each node updates its contributions to its neighbours. — Programmer, Aug 11 '23 at 07:46
Then it should be `float receivedPR = delta[edgeList[i].end] * 0.85;` inside the loop, if I understand correctly. — paleonix, Aug 11 '23 at 09:43

Computing Page Rank on GPU using Pull-based Traversal

0 Answers0