I would like to rank items according to a given users preference (items liked by the user) based on a random walk on a directed bipartite graph using gremlin in groovy.
The graph has the following basic structure:
[User1] ---'likes'---> [ItemA] <---'likes'--- [User2] ---'likes'---> [ItemB]
Hereafter the query that I came up with:
def runRankQuery(def userVertex) {
def m = [:]
def c = 0
while (c < 1000) {
userVertex
.out('likes') // get all liked items of current or similar user
.shuffle[0] // select randomly one liked item
.groupCount(m) // update counts for selected item
.in('likes') // get all users who also liked item
.shuffle[0] // select randomly one user that liked item
.loop(5){Math.random() < 0.5} // follow liked edge of new user (feed new user in loop)
// OR abort query (restart from original user, outer loop)
.iterate()
c++
}
m = m.sort {a, b -> b.value <=> a.value}
println "intermediate result $m"
m.keySet().removeAll(userVertex.out('likes').toList())
// EDIT (makes no sense - remove): m.each{k,v -> m[k] = v / m.values().sum()}
// EDIT (makes no sense - remove): m.sort {-it.value }
return m.keySet() as List;
}
However this code does not find new items ([ItemB] in example above) but only the liked items of the given user (e.g. [ItemA]).
What do I need to change to feed a new user (e.g. [User2]) with the loop step back to the 'out('likes')' step in order to continue the walk?
Once this code is working, can it be seen as an implementation of 'Personalized PageRank'?
Here the code to run the example:
g = new TinkerGraph()
user1 = g.addVertex()
user1.name ='User1'
user2 = g.addVertex()
user2.name ='User2'
itemA = g.addVertex()
itemA.name ='ItemA'
itemB = g.addVertex()
itemB.name ='ItemB'
g.addEdge(user1, itemA, 'likes')
g.addEdge(user2, itemA, 'likes')
g.addEdge(user2, itemB, 'likes')
println runRankQuery(user1)
And the output:
intermediate result [v[2]:1000]
[]
==>null
gremlin> g.v(2).name
==>ItemA
gremlin>