I have a python's code about similarity user with Pearson's Correlation and I want to analysis the step of calculation because I'm a beginner with Python hehe. When I try to calculate manually and compare with the result of this program, the result is always different. I'm wondering if I'm mistaken when try to calculate manually. The code is like this :
# A dictionary of movie critics and their ratings of a small set of movies
critics={'User 1': {'Spiderman': 1.0, 'Batman Begins': 2.0, 'Superman': 4.0},
'User 2': {'Spiderman': 2.0, 'Batman Begins': 3.0, 'Superman': 3.0}
}
from math import sqrt
# Returns the Pearson correlation coefficient for p1 and p2
def sim_pearson(prefs,p1,p2):
# Get the list of mutually rated items
si={}
for item in prefs[p1]:
if item in prefs[p2]: si[item]=1
# if they are no ratings in common, return 0
if len(si)==0: return 0
# Sum calculations
n=len(si)
# Sums of all the preferences
sum1=sum([prefs[p1][it] for it in si])
sum2=sum([prefs[p2][it] for it in si])
# Sums of the squares
sum1Sq=sum([pow(prefs[p1][it],2) for it in si])
sum2Sq=sum([pow(prefs[p2][it],2) for it in si])
# Sum of the products
pSum=sum([prefs[p1][it]*prefs[p2][it] for it in si])
# Calculate r (Pearson score)
num=pSum-(sum1*sum2/n)
den=sqrt((sum1Sq-pow(sum1,2)/n)*(sum2Sq-pow(sum2,2)/n))
if den==0: return 0
r=num/den
return r
def main():
z = sim_pearson(critics, 'User 1','User 2')
print z
if __name__ == "__main__":
main()
I want to calculate the similarity between User 1 and User 2. But I'm confused in this part :
([prefs[p1][it] for it in si])
what is the meaning of [it]?
The result of the similarity if I use this program is : 0.755928946018
is true the meaning of this code ([prefs[p1][it] for it in si])
is multiplying the ratings of User 1? Like 1*2*4
? or it has to be multiplying with the ratings of User 2? Like (1*2)+(1*3)+(4*3)
?
I'm confused with the [p1][it]
. I hope you can help me, thanks for advance.