0

I need to represent electronic health records in RDF. This kind of data is time dependent. So, I want to represent them as events. I want to use something similar to a Datomic database. Datomic uses triples with an added transaction field. This extra field is time stamped and can have user-defined metadata. I want to use named graphs to record transaction/time data.

For instance, in the query below, I only search triples of graphs from a certain editor created on a certain date:

SELECT ?name ?mbox ?date
WHERE {
    ?g dc:publisher ?name ;
       dc:date ?date .
    GRAPH ?g
    { ?person foaf:name ?name ; foaf:mbox ?mbox }
}

Queries like this one would solve my problem. My concerns are:

  • I will end up with millions of named graphs. Will they make the SPARQL queries too slow?
  • The triple store I am using, Blazegraph, has support for inference (entailments) but states that: "Bigdata does not support inference in the quads mode out of the box." Which triple stores do support inference using quads (named graphs)?
  • Is there a better way to represent this kind of data in RDF? Some kind of best practices guideline?
dilvan
  • 2,109
  • 2
  • 20
  • 32

1 Answers1

0

I will end up with millions of named graphs. Will they make the SPARQL queries too slow?

Generally speaking, not necessarily, at least not anymore than adding millions of triples in one named graph. But it really depends on your triplestore, and how good it is at indexing on named graphs.

The triple store I am using, Blazegraph, has support for inference (entailments) but states that: "Bigdata does not support inference in the quads mode out of the box." Which triple stores do support inference using quads (named graphs)?

StackOverflow is not really the right platform to ask for tool recommendations - I suggest you google around a bit instead to see feature lists of the various available triplestores.

I also suspect that at the scale you need, inferencing performance might disappoint you (again, depending on the implementation of course). Are you sure you need inferencing? Not saying you definitely shouldn't, but depending on the expressivity of the inference you need, there are quite often ways around by being a bit creative in terms of querying.

Is there a better way to represent this kind of data in RDF? Some kind of best practices guideline?

It looks like a sensible approach to me. Whether another way is better is hard to judge without knowing more about the way you intend to use this data, the scale (in number of triples), etc. As for best practices: this W3C note on N-Ary relations in RDF is a good resource. Also: How can I express additional information (time, probability) about a relation in RDF? .

Jeen Broekstra
  • 21,642
  • 4
  • 51
  • 73
  • I am not asking for tool recommendations. I am asking for any tool that has a certain feature. – dilvan May 22 '19 at 22:04
  • 2
    @dilvan that is what a tool recommendation _is_ :) I'm not saying it isn't a valid question, just that StackOverflow considers such questions off-topic - the main reason being that they attract opinionated answers instead of subjective solutions. For example, I would recommend looking into Eclipse Rdf4j or Ontotext GraphDB for entailment support over named graphs, but I am affiliated with both those products - so how much can you trust my recommendation, really? – Jeen Broekstra May 22 '19 at 22:25
  • @jean-broekstra, thanks for the information. I believe that off-topic is more like "is tool A better than tool B?", but maybe I am wrong. In any case, I don't have to trust you. Now that I know that tool X has feature Y I just have to check it. – dilvan May 23 '19 at 13:12
  • @dilvan: Also questions of the kind "Which software has feature XY?" are off-topic on Stack Overflow. They are on-topic on [softwarerecs.se], but there might not be enough Semantic Web experts -- but might be worth a try. – unor May 23 '19 at 18:21