0

I am currently building a framework using a combination of SPARQL queries with Pythons rdflib package. This framework is designed to identify triples which don't meet quality requirements.

As you may know, RDF files can contain a large number of triples and I'm having trouble on how to identify these triples to the users. My first idea was to provide them with a line number as shown below, however I'm unsure on how feasible this will be.

I'm open to any suggestions.

line 1  @prefix foaf: <http://xmlns.com/foaf/0.1/> .
line 2  _:alice foaf:knows _:bob .
line 3  _:bob foaf:knows _:alice 

Example output

Triple on line 3 doesn't meet quality requirements

Thanks for your help!

Alex R
  • 31
  • 7
  • how should a SPARQL query find a "line number" - SPARQL is q query language for RDF, there is no concept of lines nor line numbers in an RDF dataset. All you can do is to write your own parser, though it's unclear what you mean by "quality" and how you estimate it. Do you have to load the whole dataset first? In my opinion you almost always have to given that Turtle is the format - unless you just analyze a single triple - which sounds weird. What is the quality of a triple in your context? – UninformedUser Sep 09 '20 at 13:05
  • my context of quality is an R2RML mapping, which I check if triples follow R2RML standards, such as identifying an object map with a language and datatype property, which is not allowed by R2RML standards. A line number was intended to give the reader an idea of what I was hoping to achieve, I would be open to any suggestions people may have. – Alex R Sep 09 '20 at 13:20

0 Answers0