Modeling user information in a "shared" graph

Question

I have a Neo4j graph that models the static relations among each concept of a course.

Now I need to introduce the scores of each individual student in each concept.

As I need to do complex queries and having efficiency in mind, I am considering creating a independent static graph for each user so scores can be stored in each node. The problem is that if a course have hundreds of students there would be hundreds of graphs, one per user-course. Another problem is that if I need to change the static graph, I would need to apply that change to hundreds of graphs.

Another approach is storing the student scores as attributes in the nodes. This way there would be only a graph with each node having hundreds of attributes (scores), one per user.

What would be a good approach? What would be a better approach?

Thanks

score 4 · Accepted Answer · answered Jan 28 '15 at 14:16

Assuming that the scores you're talking about are how well a student did on a particular concept, it seems like storing that score in either the student node or the concept node is inappropriate. You should store it in a relation between the two. Let's say you're quizzing a student on calculus limits. I'd probably do it like this:

(s:Student {name: "Joe"})-[:learned { score: 100 }]->(c:Concept {name: "limits"})

You probably wouldn't put the score in Student or in Concept.

Data modeling wise, think about the "nouns" in your domain (here they are things like student and concept). Then think about the relationships between them (students learn concepts). Don't over-cram the nodes, but use properties on relationships too to assert metadata about those relationships. How well did a student learn a concept? That's a score attribute (or similar) on the relationship, not on the concept.

Also, I don't know how big your graph is going to be, but I probably wouldn't store a bunch of independent graphs. It's all just one huge graph, and then each student can have a "subgraph" which corresponds to a particular query. Splitting the data out into separate storage creates maintenance and refresh nightmares for you. I would only do that if you have really solid evidence that you can't make performance work as one big graph (I'm betting you can make it work). Imagine a database with a million students and a million concepts; you'll always be able to generate each student's subgraph on the fly:

MATCH (s:Student {name:"Joe"})-[l:learned]->(c:Concept)
RETURN s, l, c;

If you've ever used a relational database, you can think of this as a "view". The whole database is still the database, but using queries like this you can build customized views of the database that are tailored to individuals (in this case Joe). This buys you all of the advantages of centralizing the data administrations, modeling, storage, updates, and yet each user can see it however they like it, and can ignore 99% of the database if that's appropriate.

If you use appropriate labels and indexes, this should perform quite well; traversing relationships like this is in the dead center of the sweet spot of what neo4j is good at.

Modeling user information in a "shared" graph

1 Answers1