Jena TDB/Fuseki indexing for text search: Customize the URI location for each field

Question

I have triple store of a relatively small size, which I store and access via Jena Fuseki. Here is a snippet of my data, though more optional fields can occur:

<http://example.com/#org1>
        a              pers:family ;
        pers:name      [ pers:lang       "de" ;
                         pers:occurence  "XX" ;
                         pers:surname    "NN" ;
                         pers:type       "std" ;
                         pers:var_id     <http://example.com/#org1.01>
                       ] ;
        pers:org_type  "Family" .

<http://example.com/#per1>
        a                    pers:person ;
        pers:first_mentions  [ pers:first_mention  "1234" ;
                               pers:occurence      "XX"
                             ] ;
        pers:name            [ pers:forename   "Maria" ;
                               pers:id         <http://example.com/#per1a> ;
                               pers:lang       "de" ;
                               pers:occurence  "XX" ;
                               pers:org_id     <http://example.com/#org1> ;
                               pers:type       "std"
                             ] ;
        pers:name            [ pers:forename    "Marie" ;
                               pers:lang        "fr" ;
                               pers:occurence   "XX" ;
                               pers:org_var_id  <http://example.com/#org1.01> ;
                               pers:type        "orig" ;
                               pers:var_id      <http://example.com/#per1a.01>
                             ] ;
        pers:org_id          <http://example.com/#org1> ;
        pers:sex             "1" .

Planning to implement faceted search, I have just indexed my triple store.

It was indexed OK and I have access to the index via Solr. I configured my the indexing in my config.ttl based on the few examples found around the web. Here is the part of my config that I have questions about:

<#entMap> a text:EntityMap ;
    text:entityField      "uri" ;
    text:defaultField     "text" ;        
    text:map (
         [ text:field "text" ; text:predicate rdfs:label ]   
         [text:field "forename" ; text:predicate pers:forename ]
         [text:field "surname" ; text:predicate pers:surname ]
         [text:field "orgtype" ; text:predicate pers:org_type ]
         [text:field "occur" ; text:predicate pers:occurence ] 
         [text:field "lang" ; text:predicate pers:lang ]
         [text:field "description" ; text:predicate pers:description ]
         [text:field "sex" ; text:predicate pers:sex ]
         [text:field "marital" ; text:predicate pers:marital_status ]
         [text:field "role" ; text:predicate pers:rolename ]
         ) .

When I query the Solr, sending this query:

http://localhost:8983/solr/project/select?q=*Mari*&wt=json&indent=true

it outputs smth like this:

{
  "responseHeader":{
    "status":0,
    "QTime":16,
    "params":{
      "q":"*Mari*",
      "indent":"true",
      "wt":"json"}},
  "response":{"numFound":39,"start":0,"docs":[
      {
        "uri":["_:6fdab61c39c226f305e6419d6aa5f5e9"],
        "forename":["Maria"],
        "id":"c3f82e8c-9650-4a18-b6c3-1eaebff9830c",
        "_version_":1515091600962748416} }}

A blank node is referenced as URI. So, I understood that according to my config file, the data was indexed in such a way that "text:entityField "uri"" would look for the subject of the "text:predicate". When I was querying the index for "Mari" and the occurence was found in the field "forename", it's subject was a blank node. But for a future work with index, i.e. for facets, I need the URI of the entity (e.g. http://example.com/#per1), because I cannot use the blank nodes IDs for querying, so I cannot find out to which entry they refer.

How can I index my data, so that I could tell Solr differently for each field, where is it's URI? For example, if the indexed field is "forename", it's URI would be found somehow like this:

[text:field "forename" ; text:predicate pers:forename ; text:entityField <pattern for finding URI of the forename field>]

<pattern for finding URI of the forename field>
       URI pers:name [text:field "forename"]

About the blank node becoming a URI. Which version of jena are you using? The uri field is "_:.." which should recover the blank node. — AndyS, Oct 21 '15 at 10:49
@AndyS I am using Jena-Fuseki 1.1.0. What do you mean by the recovery of the blank node? Is there a way to get something useful out of this id? — user3241376, Oct 21 '15 at 11:38
jena-text works in conjunction with a dataset. The resource (blank node) can be used to access the dataset and models in the dataset. If you want a identifier outside the data, that's what URIs far for, not blank nodes. — AndyS, Oct 22 '15 at 14:16
@AndyS but I cannot access the index inside the dataset when I indexed it with Solr, can I? It works with Lucene only, as far as I saw in the Fuseki docs. Would it mean in my case that I can only work with Lucene indexing but not with Solr? — user3241376, Oct 22 '15 at 14:32

Jena TDB/Fuseki indexing for text search: Customize the URI location for each field

0 Answers0