Neo4j data modeling: correct way to specify a source for a statement?

Question

I'm working on a scientific database that contains model statements such as:

"A possible cause of Fibromyalgia is Microglial hyperactivity, as supported by these 10 studies: [...] and contradicted by 1 study [...]."

I need to specify a source for statements in Neo4j and be able to do 2 ways operations, like:

Find all statements supported by a study
Find all studies supporting a statement

The most immediate idea I had is to use the DOI of studies as unique identifiers in the relationship property. The big con of this idea is that I have to scan all the relationships to find the list of all statements supported by a study.

So, since it is impossible to make a link between a study and a relationship, I had the idea to make 2 links, at each extremity of the relationship. The obvious con is that it does not give information about the relationship, like "support" or "contradict".

So, I came to the conclusion that I need a node for the hypothesis:

However, it overloads the graph and we are not anymore in the classical node -relationship-> node design that makes property graphs so easy to understand.

Using RDF, it is possible to add properties to relationships using subgraphs, however there we enter semantic graphs and quad stores, which is a more complex tool.

So I'm wondering if there is a "correct" design pattern for Neo4j to support this type of need that I may not have imagined instead?

Thanks

score 1 · Answer 1 · answered Jul 26 '22 at 05:54

Based on your requirements, I think put support_study as property of edge will do the work:

Thus we could query the following as:

Find all statements supported by a study

MATCH ()-[e:has_cause{support_study: "doi_foo_bar"}]->()
RETURN e;

Find all studies supporting a statement

Given statement is “foo” is caused by “bar”

MATCH (v:disease{name: "foo"})-[e:has_cause]->(v1:sympton{name: "bar")
RETURN DISTINCT e.support_study;

While, this is mostly based on NebulaGraph, where:

It speaks cypher DQL(together with nGQL)
It supports properties in edge
It used 4-tuple(rather than a Key) to distingush an edge(src,dst,edge_type,rank), where rank is an unique design to enable multiple has_cause edge instance between one pair of disease-> sympton, you could put the hash of doi or other number as rank field(or omit, of cause, it will be 0)
It’s distributed and Open-Source(Apache 2.0)

Note: In NebulaGraph, index should be created on has_cause(support_study) and disease(name), ref: https://www.siwei.io/en/nebula-index-explained/ and https://docs.nebula-graph.io/3.2.0/3.ngql-guide/14.native-index-statements/

But, I think it applies to neo4j, too :)

Neo4j data modeling: correct way to specify a source for a statement?

1 Answers1