1

In AWS Neptune, I have a graph that has many clusters like the one defined below:

g.addV('person').property('name', 'John').as('p1')
.addV('person').property('name', 'Weihua').as('p2')
.addV('Fellowship of the Ring').as('movie1')
.addV('A New Hope').as('movie2')
.addE('likes_movie').from('p1').to('movie1')
.addE('likes_movie').from('p1').to('movie2')
.addE('likes_movie').from('p2').to('movie1')
.addE('likes_movie').from('p2').to('movie2')

Sample cluster

Each cluster contains a number of movie nodes and person nodes with connections indicating which movies people like. How can I write a query that finds every pair of two person nodes that share two liked movies in common and create a "shares_two_liked_movies" relationship between them? Such a query would result in my cluster looking like this:

Cluster after query

So far, I've realized that I can detect a person has any other person who shares two liked movies with them if the query

g.V().has('name', 'John').as('src').out('likes_movie').in().where(neq('src'))

returns the same vertex id twice. However, I'm not sure how to translate this into the query I want.

jjjjohnson
  • 11
  • 2
  • Please clarify your specific problem or provide additional details to highlight exactly what you need. As it's currently written, it's hard to tell exactly what you're asking. – Community Jul 01 '23 at 21:10
  • Gremlify workspace: https://gremlify.com/bvdwu73ndne/1 – MorKadosh Jul 04 '23 at 07:26
  • Are you looking to find two people who both liked exactly two movies, or is it more general, perhaps two people who liked two or more of the same movies? – Kelvin Lawrence Jul 05 '23 at 22:15
  • Also, do you need to do this for all people in the graph or will you have one (or perhaps a few) people to start from? This will very much inform the approach taken and the time that approach might take to execute. – Kelvin Lawrence Jul 05 '23 at 22:52
  • @KelvinLawrence I'm looking to create the "shares_two_liked_movies" between any two people that share two or more movies in common that they both like. I would like to do it for all people in the graph. – jjjjohnson Jul 06 '23 at 19:48
  • I added a partial answer below then ran out of time for right now. It shows how to find the number of common movies for people. You could either use this information to send a second query to create the edges or extend this to do it all in one query. As time allows I will try to extend the answer but I wanted to give you something to get you started. – Kelvin Lawrence Jul 06 '23 at 22:31

1 Answers1

1

I added one more person to your example graph so that there is a person with only one movie in common.

g.addV('person').property('name', 'John').as('p1')
 .addV('person').property('name', 'Weihua').as('p2')
 .addV('person').property('name', 'Kelvin').as('p3')
 .addV('Fellowship of the Ring').as('movie1')
 .addV('A New Hope').as('movie2')
 .addE('likes_movie').from('p1').to('movie1')
 .addE('likes_movie').from('p1').to('movie2')
 .addE('likes_movie').from('p2').to('movie1')
 .addE('likes_movie').from('p2').to('movie2')
 .addE('likes_movie').from('p3').to('movie2')

Using that graph, one way to find the common movie counts is using groupCount as shown below. Note that simplePath is another way of avoiding the start point appearing again in the results.

g.V().has('name','John').
  out('likes_movie').
  in('likes_movie').simplePath().
  groupCount().by('name')

when run this gives us

{'Kelvin': 1, 'Weihua': 2}

We can extend the query to create a nested group. The key to the outer group being the person whose pairings we have analyzed.

g.V().has('name','John').
  group().
    by('name').
    by(
      out('likes_movie').
      in('likes_movie').simplePath().
    groupCount().by('name'))

Which gives us

{'John': {'Kelvin': 1, 'Weihua': 2}}

The query can be further modified to not include the names but rather the vertex identifiers. This will make it easier to create edges.

g.V().has('name','John').
  group().
    by().
    by(
      out('likes_movie').
      in('likes_movie').simplePath().
    groupCount().by())

which gives

{v[66c4967e-a9bd-3ac3-3f0b-d780c98d162c]: {v[46c4967e-a9c6-a4ed-89a5-bb38ca8f2589]: 2, v[34c4967e-a9c9-18d1-0e12-b8d1f80ddf47]: 1}}

This gives us the basic building blocks for finding the pairings and their vertex identifiers. From here you could choose to write a second query to create the edges or extend this one further.

Kelvin Lawrence
  • 14,674
  • 2
  • 16
  • 38