The ratings are stored in the numpy matrix ratings
, where rows correspond to users (index u
), while columns correspond to items (index i
). Since you want to calculate sim(u, u')
, i.e., the similarity between users, let's assume below that kind = 'user'
.
Now, let's have a look first on r_{ui}r_{u'i}
without the square-root scaling factors. This expression is summed over i
which can be interpreted as matrix multiplication of r
with the transpose of r
, i.e.:
\sum_i r_{ui}r_{u'i} = \sum_i r_{ui}(r^T)_{iu'} =: s_{uu'}
As done already above, let's denote the resulting matrix as s
(variable sim
in the code). This matrix is by definition symmetric and its rows/columns are labeled with the "user" indices u/u'
.
Now, the scaling "factor" f_{u} := \sqrt\sum_i r^2_{ui}
is in fact a vector indexed with u
(each item of which is the Euclidean norm of the corresponding row of the matrix r
). However, having constructed s_{uu'}
, we can see that f_{u}
is nothing else than \sqrt s_{uu}
.
Finally, the similarity factor of interest is s_{uu'}/f{u}/f{u'}
. The posted code calculates this for all indices u/u'
and returns the result as a matrix. To this end, it:
- calculates
sim
(the matrix s
above) as ratings.dot(ratings.T)
- gets the square root of its diagonal (the vector
f
above) as np.sqrt(np.diagonal(sim))
- to do the row/column scaling of
s
in an efficient way, this is then expressed as a two-dimensional array norms = np.array([np.sqrt(np.diagonal(sim))])
(note the missing []
in your post)
- finally, the matrix
s_{uu'}/f{u}/f{u'}
is calculated as sim / norms / norms.T
. Here, since norms
has shape (1, num_of_users)
, the first division does the column scaling, while the division with norms.T
scales the rows.