0

I am using gremlin QL on AWS Neptune Database to generate Recommendations for a user to try new food items. The problem that I am facing is that the recommendations need to be in the same cuisine as the user likes. We are given with three different types of nodes which are- "User", "the cuisine he likes" and "the category of the cuisine" that it lies in. enter image description here

In the picture above, the recommendations for "User 2" would be "Node 1" and "Node 2". However "Node 1" belongs to a different category which is why we cannot recommend that node to "User2". We can only recommend "Node 2" to the user since that is the only node that belongs to the same category as the user likes. How do I write a gremlin query to achieve the same?

Note- There are multiple nodes for a user and multiple categories that these nodes belong to.

1 Answers1

2

Here's a sample dataset that we can use:

g.addV('user').property('name','ben').as('b')
  .addV('user').property('name','sally').as('s')
  .addV('food').property('foodname','chicken marsala').as('fvm')
  .addV('food').property('foodname','shrimp diavolo').as('fsd')
  .addV('food').property('foodname','kung pao chicken').as('fkpc')
  .addV('food').property('foodname','mongolian beef').as('fmb')
  .addV('cuisine').property('type','italian').as('ci')
  .addV('cuisine').property('type','chinese').as('cc')
  .addE('hasCuisine').from('fvm').to('ci')
  .addE('hasCuisine').from('fsd').to('ci')
  .addE('hasCuisine').from('fkpc').to('cc')
  .addE('hasCuisine').from('fmb').to('cc')
  .addE('eats').from('b').to('fvm')
  .addE('eats').from('b').to('fsd')
  .addE('eats').from('b').to('fkpc')
  .addE('eats').from('b').to('fmb')
  .addE('eats').from('s').to('fmb')

Let's start with the user Sally...

g.V().has('name','sally').

Then we want to find all food item nodes that Sally likes.

(Note: It is best to add edge labels to your edges here to help with navigation.)

Let's call the edge from a user to a food item, "eats". Let's also assume that the direction of the edge (they must have a direction) goes from a user to a food item. So let's traverse to all foods that they like. We'll save this to a temporary list called 'liked' that we'll use later in the query to filter out the foods that Sally already likes.

.out('eats').aggregate('liked').

From this point in the graph, we need to diverge and fetch two downstream pieces of data. First, we want to go fetch the cuisines related to food items that Sally likes. We want to "hold our place" in the graph while we go fetch these items, so we use the sideEffect() step which allows us to go do something but come back to where we currently are in the graph to continue our traversal.

    sideEffect(
        out('hasCuisine').
        dedup().
        aggregate('cuisineschosen')).

Inside of the sideEffect() we want to traverse from food items to cuisines, deduplicate the list of related cuisines, and save the list of cuisines in a temporary list called 'cuisinechosen'.

Once we fetch the cuisines, we'll come back to where we were previously at the food items. We now want to go find the related users to Sally based on common food items. We also want to make sure we're not traversing back to Sally, so we'll use simplePath() here. simplePath() tells the query to ignore cycles.

in('eats').
    simplePath().

From here we want to find all food items that our related users like and only return the ones with a cuisine that Sally already likes. We also remove the foods that Sally already likes.

out('eats').
    where(without('liked')).
    where(
        out('hasCuisine').
        where(
            within('cuisineschosen'))).
  values('foodname')

NOTE: You may also want to add a dedup() here after out('eats') to only return a distinct list of food items.

Putting it altogether...

g.V().has('name','sally').
  out('eats').aggregate('liked').
    sideEffect(
        out('hasCuisine').
        dedup().
        aggregate('cuisineschosen')).
    in('eats').
    simplePath().
  out('eats').
    where(without('liked')).
    where(
        out('hasCuisine').
        where(
            within('cuisineschosen'))).
  values('foodname')

Results:

['kung pao chicken']

At scale, you may need to use the sample() or coin() steps in Gremlin when finding related users as this can fan out really fast. Query performance is going to be based on how many objects each query needs to traverse.

Taylor Riggan
  • 1,963
  • 6
  • 12
  • Thanks for your response! However, the problem that I am facing is that I don't know what all cuisines the user will have. Here in the answer you have mentioned 'vegan' and 'pescatarian'. I am unable to look what the cuisines are, thus will need a query that can capture the cuisines and check on its own. It is a large db with more than 25k users. Also we need to conduct these operations from the perspective of user 2 since he will be getting the recommendations based on User1. – sharpshine99 Jul 29 '22 at 01:00
  • Just to make sure I'm following.... Given a User ("User 2"), you want to find other users who also like the same foods, determine what those foods are, and then determine the foods-in-common that are in the same cuisines that "User 2" also likes? – Taylor Riggan Jul 29 '22 at 01:07
  • Yes you are absolutely right about the concept. The way I would be writing the query is I would be given the User2's ID, and then I will have to return the recommendations. It's just that I won't be able to see the preferences and cannot filter out using the exact cuisine names. I need to capture their category nodes without knowing them. Also thanks for helping me out. – sharpshine99 Jul 29 '22 at 01:14
  • I modified the answer above to take that into account now. A bit more complex, but hopefully I've explained that well enough. Just realize that at-scale, you may need to do some random sampling if you want to get recommendations back quickly. If you want to look at the full graph of 25k users, that will still work but may take a few seconds or minutes to return. – Taylor Riggan Jul 29 '22 at 12:57
  • Thank you for your detailed explanation of the solution. This is exactly what I was looking for. Appreciate it. – sharpshine99 Aug 01 '22 at 21:48