2

I have build a small RDF model: it only contains a few triples describing some items on the human genome.

I want to retain only those items overlapping some genomic segments (say a "gene"), stored in another relational database. This database of genes is far too big to be inserted in my inital RDF model.

Is there any way to extend ARQ to inject some new Statements (the RDF statements describing the only genes overlapping the items) in my model during the query ?

input:

uri:object1  my:hasChromosome "chr1" .
uri:object1  my:hasStartPosition "1235689887" .
uri:object1  my:hasEndPosition "2897979879" .
uri:object1  dc:title "my variation" .

output:

uri:object1  my:hasChromosome "chr1" .
uri:object1  my:hasStartPosition "1235689887" .
uri:object1  my:hasEndPosition "2897979879" .
uri:object1  dc:title "my variation" .
uri:gene1  dc:title "GeneName" .

I've read about http://jena.sourceforge.net/ARQ/arq-query-eval.html but I'm lost: which mechanism of extension should I choose ? Property ? Is there any more complete example on web ?

Thanks,

Pierre
  • 34,472
  • 31
  • 113
  • 192
  • Try SPARQL Update (http://www.w3.org/TR/sparql11-update/) instead of ARQ. – Alex Sep 19 '12 at 21:50
  • 1
    Also, you're looking at the old Jena site. Jena is now an Apache project: http://jena.apache.org/documentation/ – Alex Sep 19 '12 at 21:53

2 Answers2

2

Details are a bit thin here. Start simple, using a custom function. That will let you do external lookups in FILTERs or, using BIND, retrieve values.

For updating you might want to consider SPARQL Update.

Finally, you said

I want to retain only those items overlapping some genomic segments (say a "gene"), stored in another relational database.

So perhaps something like:

PREFIX my: <...>
PREFIX f:  <java:com.example.DBFunctions.>

DELETE { ?missing ?p ?o } # Purge the non-overlapping objects
WHERE {
    ?missing my:hasChromosome ?chr ; 
             my:hasStartPosition ?start ;
             my:hasEndPosition ?end .
    FILTER (!f:overlaps(?chr, ?start, ?end)) # true if not overlapping
}

Ok, I'm guessing here but I hope that helps a little.

user205512
  • 8,798
  • 29
  • 28
  • I know about creating a custom function (http://plindenbaum.blogspot.fr/2008/11/taxonomy-and-semantic-web-writing.html) but that is not what I need. As I said, I want to inject some new statements that are not part of the initial RDF model. the data for "gene" would be stored elsewhere, not in a RDF datastore. – Pierre Sep 19 '12 at 21:04
  • Ah, I think I've completely misunderstood. When you say "inject some new Statements" you mean something "make it appear that the queried model is rdf model + stuff in relation db"? – user205512 Sep 19 '12 at 22:56
2

You have two datastores. One a small dataset in a Jena in memory Model, and a large set of gene related data in a relational database. You want to write a sparql query as if the large set of data is local without actually importing it. (The actual data transformation you want to do is a bit vague.)

In SPARQL 1.1 you can do this using the SERVICE keyword between sparql endpoints. To be able to use your relational database of gene data as a SPARQL endpoint you need a SPARQL to SQL translator such as D2RQ or convert the data to RDF and load it into a general purpose SPARQL capable triple-store.

Once the gene data is available in a SPARQL endpoint.

PREFIX my: <...>
PREFIX f:  <java:com.example.DBFunctions.>

INSERT { ?missing a my:Gene } # mark a region as a gene
WHERE {
    ?missing my:hasChromosome ?chr ; 
         my:hasStartPosition ?start ;
         my:hasEndPosition ?end .
    SERVICE<http://localhost:????/gene_data/sparql>{
       ?gene a my:Gene .
         my:hasStartPosition ?gStart ;
         my:hasEndPosition ?gEnd .
       #Detect overlap.
       FILTER( !(?start > ?gEnd || ?end < ?gStart) ) .
    }
}

The other option is to do the filter as @user205512 shows by using a custom function. Where the filter java code uses JDBC to connect to the relational database.

Jerven
  • 582
  • 3
  • 7