0

I have a developer vertex and a project vertex and an edge with start date and end date properties. A developer can work only on one project at a given point of time. Sometimes they are assigned to multiple projects with the same start date by user by mistake.

  1. I need to find out if a developer has more than one project assigned to them with same start date. If it is then I need to print

      001 Akash 2021-06-01  2
    
  2. If you look at 3rd developer, he was assigned to 2 projects starting 2021-07-01. But one of them is valid as it has an end date which has a corresponding record with start date of 2021-09-01. In my 2nd report I need to list

{"id":"003_P003","label":"works_in","start_date":"2021-07-01","end_date":"2021-07-05"} which is duplicate and odd one.

I tried following query for #1,but it does not show me both id, start date and count. It only shows me developer's id and count

 g.V().
  hasLabel('developer').
  outE('works_in').
  groupCount().by(outV().id()).
  groupCount().by('start_date').
  limit(2).
  unfold().
  toList()

Then I tried next two which did not work and shows error message "‘Column’ object is not callable" (I am using Python Germlin)

 g.V().
  hasLabel('developer').
  outE('works_in').by(values(outV().id(), 'start_date')).
  groupCount().
  unfold().
  toList() 

The next one below also throws error.

 g.V().
  hasLabel('developer').
  outE('works_in').
  groupCount().by(outV().id(), 'start_date').
  unfold().
  toList()

#2 I did not start yet. I am not sure if it can be done with Gremlin though.

developer (vertex)

[{"id":"001","label":"developer","name":"Akash","skill":"c#"},
{"id":"002","label":"developer","name":"John","skill":"react"}, 
{"id":"003","label":"developer","name":"Bruno","skill":"python"}]

project (vertex)

[{"id":"P001","label":"project","name":"Web App"},{"id":"P002","label":"project","name":"Smart Contract"}, {"id":"P003","label":"project","name":"Migrate to AWS"}]

works_in (edge)

    [{"id":"001_P001","label":"works_in","start_date":"2021-06-01","end_date":"2021-12-31"},
    {"id":"002_P002","label":"works_in","start_date":"2021-01-01","end_date":"2021-12-31"}, 
    {"id":"001_P002","label":"works_in","start_date":"2021-06-01","end_date":"2021-06-30"},
    
    {"id":"003_P003","label":"works_in","start_date":"2021-01-01","end_date":"2021-06-30"},
    {"id":"003_P003","label":"works_in","start_date":"2021-07-01","end_date":"2021-07-05"},
    {"id":"003_P002","label":"works_in","start_date":"2021-07-01","end_date":"2021-08-31"},
    {"id":"003_P002","label":"works_in","start_date":"2021-09-01","end_date":"2021-12-31"}

]

Any help is appreciated. Also I am looking for tutorial/course to understand Gremlin query better other than Tinkerpop's official documentation.

user2026504
  • 69
  • 1
  • 9

1 Answers1

1

Query for the first usecase:

   gremlin> g.V().
......1>   hasLabel('developer').
......2>   local(
......3>     __.as('b').
......4>     outE().
......5>     project('id', 'name', 'startDate').
......6>       by(select('b').values('id')).
......7>       by(select('b').values('name')).
......8>       by(values('startDate')).
......9>     groupCount().
.....10>     unfold().as('a').
.....11>     select(values).
.....12>     is(gt(1)).
.....13>     select('a')).
.....14>   local(
.....15>     union(select(keys).unfold().select(values), select(values)).fold())
==>[003,bruno,2021-07-01,2]
==>[001,akash,2021-06-01,2]

Query for the second usecase

I couldn't write any straight forward query for the second usecase.

  • Thx. Any documentation/tutorial to understand the gremlin query structure, Looking at your query I understand I am still struggling to get a grip on gremlin query structure like when I was trying different queries to solve this issue I never used local/union even more I never thought of. So certainly there is a big gap in understanding how Gremlin query works, – user2026504 Dec 02 '21 at 15:29
  • I find this book from Kelvin really helpful https://kelvinlawrence.net/book/Gremlin-Graph-Guide.html – PrashantUpadhyay Dec 02 '21 at 19:30