0

Hi everyone at Stackoverflow,

I want to understand query that is using Pearson.

What can be nom and denom?

What is r1: r1 and r2: r2?

And I don't understand what is r.r1.rating and r.r2.rating.

This query should be recommending Movies that are rated by other Users.

MATCH (u1:User {id: 3})-[r:RATED]->(m:Movie)
WITH u1, avg(r.rating) AS u1_mean
MATCH (u1)-[r1:RATED]->(m:Movie)<-[r2:RATED]-(u2)
WITH u1, u1_mean, u2, COLLECT({r1: r1, r2: r2}) AS ratings WHERE size(ratings) > 10
MATCH (u2)-[r:RATED]->(m:Movie)
WITH u1, u1_mean, u2, avg(r.rating) AS u2_mean, ratings
UNWIND ratings AS r
WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,
     sqrt( sum( (r.r1.rating - u1_mean)^2) * sum( (r.r2.rating - u2_mean) ^2)) AS denom,
     u1, u2 WHERE denom <> 0
WITH u1, u2, nom/denom AS pearson
ORDER BY pearson DESC LIMIT 10
MATCH (u2)-[r:RATED]->(m:Movie) WHERE NOT EXISTS( (u1)-[:RATED]->(m) )
RETURN m.name, SUM( pearson * r.rating) AS score
ORDER BY score DESC LIMIT 25

The output is as follows:

"m.name" │"score" │

│"Sleepless in Seattle" │25.859451877376813│

│"The Tunnel" │22.652532472101605│

│"Beetlejuice" │22.21835919736008 │

│"Shriek If You Know What .."│21.935357890253528│

│"Dawn of the Dead" │21.421377433824798│

│"The Prisoner of Zenda" │21.225502683325033│

│"The Talented Mr. Ripley" │20.83938743140176 │

Any suggestions will be helpful.

Anna
  • 1
  • 1
  • 4

1 Answers1

0

So the formula for Pearson is described here: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample

nom is simply the numerator of that formula, defined here: "WITH sum( (r.r1.rating-u1_mean) * (r.r2.rating-u2_mean) ) AS nom,"

Likewise, denom is the denominator.

I'm less clear on the other two questions, but hopefully this helps!

Philip H
  • 300
  • 1
  • 17