I have the following neo4j database design:
- Nodes:
- Courses
- Skills
- Jobs
- Relationships:
- Courses have Skills
- Jobs have skills
** Both Courses and Skills have no linearity/dependencies. E.g. It is not necessary to take course A in order to take Course B.
The question I am trying to solve:
Given Job Node #3 (which has Skill #1, #2, and #3) what sets of courses can be taken to attain these skills (with sets ordered by the minimum number of courses)? I.e. If you want to work at Job X, what are the options / paths for sets of Courses you could take to get the required Skills?
Sample data:
CREATE (PythonProgramming:Course { name: 'python_programming'})
CREATE (DataScience:Course { name: 'introduction_to_data_science'})
CREATE (MachineLearning:Course { name: 'machine_learning'})
CREATE (Statistics:Course { name: 'statistics'})
CREATE (Regression:Course { name: 'regression'})
CREATE (Python:Skill {name: "python"})
CREATE (Probability:Skill {name: "probability"})
CREATE (LogisticRegression:Skill {name: "logistic_regression"})
CREATE (Google:Job {name: "google"})
CREATE
(DataScience)-[:HAS_SKILL]->(Python),
(DataScience)-[:HAS_SKILL]->(LogisticRegression),
(DataScience)-[:HAS_SKILL]->(Probability),
(MachineLearning)-[:HAS_SKILL]->(LogisticRegression),
(MachineLearning)-[:HAS_SKILL]->(Probability),
(Statistics)-[:HAS_SKILL]->(Probability),
(PythonProgramming)-[:HAS_SKILL]->(Python),
(Regression)-[:HAS_SKILL]->(LogisticRegression),
(Google)-[:REQUIRES_SKILL]->(Python),
(Google)-[:REQUIRES_SKILL]->(LogisticRegression),
(Google)-[:REQUIRES_SKILL]->(Probability)
Based on this data, the shortest path to a Job at Google should be, in order:
- Data Science (has all 3 skills rquired)
- Python Programming + MachineLearning
- Python Programming + Statistics + Regression