0

I was making a graph for a recommendation system and added vertices for users, categories and products and edges to represent the connections between them. One product may have connections to categories and a rating as a property for them. Users can also have a rating for each category. So, it is something like this:

-- User preferences.
SELECT * FROM cypher('RecommenderSystem', $$
    MATCH (a:Person {name: 'Abigail'}), (A:Category), (C:Category), (H:Category)
    WHERE A.name = 'A' AND C.name = 'C' AND H.name = 'H' 
    CREATE (a)-[:RATING {rating: 3}]->(C),
           (a)-[:RATING {rating: 1}]->(A),
           (a)-[:RATING {rating: 0}]->(H)
$$) AS (a agtype);

-- Products rating.
SELECT * FROM cypher('RecommenderSystem', $$
    MATCH (product:Product {title: 'Product_Name'}), (A:Category), (C:Category), (H:Category)
    WHERE A.name = 'A' AND C.name = 'C' AND H.name = 'H' 
    CREATE (product)-[:RATING {rating: 0}]->(C),
           (product)-[:RATING {rating: 4}]->(A),
           (product)-[:RATING {rating: 0}]->(H)
$$) AS (a agtype);

graph

My recommendation system is based on Content Filtering, which uses information we know about people and products as connective tissue for recommendations. So for this, it would be necessary to do a calculation like: [(user_rating_C x product_rating_C) + (user_rating_A x product_rating_A) + (user_rating_H x product_rating_H)] / (num_categories x max_rating). For example, the likelihood of Abigail liking the product from the cypher query above would be:

[(3 x 0) + (1 x 4) + (0 x 0)] / (3 x 4) = 0.333 which in a range from 0 to 4, she is likely going to hate the product. And the closer to 4, the more likely becomes for the user to buy or consume the product.

But then, how could I retrieve every edge rating that is connected to a person and a product and do this type of calculation with it?

Matheus Farias
  • 716
  • 1
  • 10
  • Did you mean "product" instead of "movie"? Also, provide a sample graph and show what your calculated result would be. – cybersam Apr 19 '23 at 22:58
  • @cybersam Yes, it's product, sorry. Changed to product so that is more general in the question. I'll add the sample graph and how the calculations must be done. – Matheus Farias Apr 19 '23 at 23:13
  • How do you determine which product rating to use with each category, why is `num_categories` 4 when there are only 3 categories, and how is `max_rating` determined? – cybersam Apr 20 '23 at 00:30
  • Each rating is an edge between the product and the category and also from the user to the category. I was thinking it would be possible to determine which one to use with which other by looking at the edge (where is the node that the edge comes from and where it is pointing) so then if the edge points to a specific category, we can determine the rating. And you are correct, it's only 3 x 4, and not 4 x 4. – Matheus Farias Apr 20 '23 at 00:40
  • I was watching this video about recommendation systems on YouTube and tried to replicate it in AGE: https://youtu.be/n3RKsY2H-NE – Matheus Farias Apr 20 '23 at 00:44
  • There are edges from 5 products going to every category. For any given category, how do you determine which product edge to use? – cybersam Apr 20 '23 at 00:46
  • In cypher syntax it would be something like `MATCH (a)-[e:RATING]->(b) WHERE b.name = 'C' AND a.name = 'product_name' RETURN e.rating` if the rating that we are looking for is the one for C and connects to a specific product. – Matheus Farias Apr 20 '23 at 00:56

3 Answers3

2

The following query should work for this situation

SELECT e1/(ct*4) AS factor FROM cypher('RecommenderSystem', $$
MATCH (u: Person)-[e1: RATING]->(v: Category)<-[e2: RATING]-(w:      
Product), (c: Category) WITH e1, e2, COUNT(DISTINCT c) AS ct
RETURN SUM(e1.rating* e2.rating)::float, ct  
$$) AS (e1  float, ct agtype);

This outputs:

      factor       
-------------------
0.333333333333333
(1 row)

Explanation

You need to find the category for which the person and product both have set the rating using the MATCH clause. Once you get these ratings, the sum of the product of these ratings would give

[(user_rating_C x product_rating_C) + (user_rating_A x product_rating_A) + (user_rating_H x product_rating_H)]

Now to divide it by the product of

(num_categories x max_rating)

You get num_categories using COUNT(DISTINCT c) and I assume that you already know the max_rating.

Hope it helps

Edit

I assumed that by num_categories, you meant the total number of categories in the system and not the only ones that are associated with the person and product in common. In case, num_categories is the count of categories associated with product and person in common, then modify your WITH clause as

WITH e1, e2, COUNT(*) AS ct

Else is fine

Zainab Saad
  • 728
  • 1
  • 2
  • 8
  • Thanks for the help Zainab Saad, the query does work but only thing is that I need to pass the name of the product so that it calculates correctly, like: `MATCH (u:Person)-[e1:RATING]->(v:Category)<-[e2:RATING]-(w:Product {name: 'Product_One'})`. So then, if I wanted to get every product name and pass it to this query as the name property, do you know how this should be done? – Matheus Farias Apr 20 '23 at 17:09
  • I could not get your point, can you elaborate more please or give a detailed example? – Zainab Saad Apr 21 '23 at 06:48
  • What I meant to say is: with the query that you provided, it works only for one `Product` vertex and one `Person` vertex. But what I want is to retrieve all of the guessed ratings for each product to each person. So, two columns, one for person, another one for the product, so it returns something like: `Abigail (person) : 1.3 (product_1 estimated rating)` and then in the next line: `Abigail (person) : 0.5 (product_2 estimated rating)` – Matheus Farias Apr 21 '23 at 12:17
2

If I understand correctly, you want to calculate the rating of each product for a user based on the given formula: [(user_rating_C x product_rating_C) + (user_rating_A x product_rating_A) + (user_rating_H x product_rating_H)] / (num_categories x max_rating). According to your model, max_rating is set to 4 (range from 0 to 4). To perform this calculation, you can use the following query:

SELECT * FROM cypher('RecommenderSystem', $$
    MATCH (a: Person {name: 'Abigail'})-[r1: RATING]->(c: Category)<-[r2: RATING]-(p:Product)
    WITH a.name AS person, p.title AS product, 
         SUM(r1.rating * r2.rating)/(count(c) * 4)::float AS rate
    RETURN person AS a, product AS p, rate AS r
$$) AS (a agtype, p agtype, r float);

I added another product (rating 0 with category C, rating 1 with category A and rating 3 with category H) and this query gave me these results: Query results. Person: Abigail, product: Product_Name, rating: 0.33 and Person: Abigail, product: Other_Product, rating: 0.083

Wendel
  • 763
  • 1
  • 12
1

Something like this may work for you:

WITH
  'Abigail' AS perName,
  [{c: 'A', p: 'prod_1'}, {c: 'C', p: 'prod_9'}, {c: 'H', p: 'prod_4'}] AS x
MATCH (per:Person)-[perRating:RATING]->(cat:Category)<-[prodRating:RATING]-(prod:Product)
WHERE per.name = perName AND ANY(i IN x WHERE cat.name = i.c AND prod.name = i.p)
WITH *, SUM(perRating.rating*prodRating.rating) AS total, MAX(prodRating.rating) AS maxProdRating
RETURN per, total/(SIZE(x) * maxProdRating) AS affinity

perName is the person's name, x is a list of the desired category/product name pairs, and affinity will be the calculated result.

NOTE: Even if not all desired pairs in x are found in the data, this query uses the size of x in the denominator. Adjust the query if this is not wanted.

[UPDATE]

Unfortunately, the ANY predicate function is not part of openCypher, so it is not supported by Apache AGE.

Even more unfortunately, even though list comprehension is a part of openCypher, AGE does not yet support that either.

But, on an openCypher system that does support list comprehension, we could replace this:

ANY(i IN x WHERE cat.name = i.c AND prod.name = i.p)

with something like this (we don't care about the generated list's contents, so we just use arbitrary 1 elements):

SIZE([i IN x WHERE cat.name = i.c AND prod.name = i.p | 1]) > 0
cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Thanks for the answer @cybersam, but it seems that AGE doesn't allow to add the WHERE clause inside the ANY function, it throws an error: `ERROR: syntax error at or near "WHERE" LINE 6: WHERE per.name = perName AND ANY(i IN x WHERE cat.name = i.c...` – Matheus Farias Apr 20 '23 at 17:26