We are using neo4j to store academic and professional evolution of many persons (User) in order to process and provide the information it contains for our clients.
For example, they may ask for the most frequent professional evolution for people who had a certain diploma (represented as a “Diploma” node in the graph database) and worked later in a certain job (a “Job” node).
Both Job and Diploma play a similar role, as a “kind” of step everyone could have on his or her resume (users are not directly linked to them). Resume are professional activities who were held by a certain person at a certain period in time: those have one and only one User related to them, and also :CONTAINS
a "Job_or_Diploma" to help classifying them.
The Resume of a given user are linked to each other by a :LEADS_TO
relationship, following their succession in time. This succession gives back the whole resume of this user. So we have paths like:
(u:User)-[:HAS]->(:Resume)-[:LEADS_TO]->(:Resume)<-[:HAS]-(u)
Thus the problem we need to solve would become : find the most popular path between the start:Diploma node and the end:Job node, among all paths with "Job_or_Diploma" nodes only.
As we define the most common path by properties which do not belong directly to "Job_or_Diploma" nodes, and even to any individual node (the 'popularity' of a path depends on the whole path because we measure it by the number of Users who followed it completely), we struggle to determine how to find it.
Another pitfall we have to face is that there are actually no direct links between 2 "Job_or_Diploma" : a Resume :LEADS_TO
the following Resume in the curriculum of the User who :HAS
them, and each Resume :CONTAINS
one "Job_or_Diploma" of more, but those "Job_or_Diploma" have no edge between each other (cf screenshots below). So the path we are looking for doesn't exist in the graph.
Hence the question : is there a way to find the 'path', or rather 'succession' of "Job_or_Diploma" that were taken (in chronological order) by the highest number of Users between a certain diploma and a certain job? Ideally, we look for something that we could implement using Cypher only.
We would be glad to receive any insight to show us how to proceed. Below are a few screens of some parts of our graph which are implied in this problem:
The kind of configuration we are dealing with in this problem
2 users' Resumes between a diploma in Engineering and a position in Management
4 users' length-2 Resume paths between a diploma in Engineering and a Technical Director position
Thank you in advance for your help!