0

We are using neo4j to store academic and professional evolution of many persons (User) in order to process and provide the information it contains for our clients.

For example, they may ask for the most frequent professional evolution for people who had a certain diploma (represented as a “Diploma” node in the graph database) and worked later in a certain job (a “Job” node).

Both Job and Diploma play a similar role, as a “kind” of step everyone could have on his or her resume (users are not directly linked to them). Resume are professional activities who were held by a certain person at a certain period in time: those have one and only one User related to them, and also :CONTAINS a "Job_or_Diploma" to help classifying them.

The Resume of a given user are linked to each other by a :LEADS_TO relationship, following their succession in time. This succession gives back the whole resume of this user. So we have paths like:

(u:User)-[:HAS]->(:Resume)-[:LEADS_TO]->(:Resume)<-[:HAS]-(u)

Thus the problem we need to solve would become : find the most popular path between the start:Diploma node and the end:Job node, among all paths with "Job_or_Diploma" nodes only.

As we define the most common path by properties which do not belong directly to "Job_or_Diploma" nodes, and even to any individual node (the 'popularity' of a path depends on the whole path because we measure it by the number of Users who followed it completely), we struggle to determine how to find it.

Another pitfall we have to face is that there are actually no direct links between 2 "Job_or_Diploma" : a Resume :LEADS_TO the following Resume in the curriculum of the User who :HAS them, and each Resume :CONTAINS one "Job_or_Diploma" of more, but those "Job_or_Diploma" have no edge between each other (cf screenshots below). So the path we are looking for doesn't exist in the graph.

Hence the question : is there a way to find the 'path', or rather 'succession' of "Job_or_Diploma" that were taken (in chronological order) by the highest number of Users between a certain diploma and a certain job? Ideally, we look for something that we could implement using Cypher only.

We would be glad to receive any insight to show us how to proceed. Below are a few screens of some parts of our graph which are implied in this problem:

The kind of configuration we are dealing with in this problem

2 users' Resumes between a diploma in Engineering and a position in Management

4 users' length-2 Resume paths between a diploma in Engineering and a Technical Director position

Thank you in advance for your help!

pierre
  • 1
  • 1

1 Answers1

0

Since a node can have multiple labels, your Resume nodes can also have either a Diploma or Job label. There is no need to have separate nodes with Diploma or Job labels.

So, you can simplify your data model to, for example:

(u:User)-[:HAS]->(:Resume:Diploma)-[:LEADS_TO]->(:Resume:Job)<-[:HAS]-(u)

And finding all paths that start with an "Engineering" degree and end at a "Management" job would be this simple:

MATCH path=(u:User)-[:HAS]->(:Diploma {type: 'Engineering'})-[:LEADS_TO*]->(:Job {type: 'Management'})
RETURN path;

In addition, you may have no real reason to have a Resume label at all (e.g., the above query does not use it), so you could further simplify your data model to just have the Diploma and Job labels.

cybersam
  • 63,203
  • 6
  • 53
  • 76
  • Thank you! Indeed it would be much simpler. However, we didn't imagine we could change our model, because it seems we need that distinction between `Resume` (which belong to one user) and `ResumeNode` (the actual label for "Job_or_Diploma" nodes in our database, which stand for a 'generic' carreer step - eg, "Directeur Technique" on the screenshots of the post) : For example, without it, we don't know how to get all the multiple jobs a user have done in one enterprise, or how to access all the locations related to his resume.But if you do, we d be grateful to you to sharing some hints with us! – pierre Jun 29 '18 at 15:34
  • If I understand you correctly, a `Job` node could have a "title" property for things like "Directeur Technique" -- or a relationship to a `Title` node. And each `Diploma` or `Job` node could have an `AT` relationship to a `Company` or `School` node. – cybersam Jun 29 '18 at 17:47