How to create a custom AnalyzerFactory in GraphDB full text search?

Question

(Using GraphDB 8.1 free). http://graphdb.ontotext.com/documentation/free/full-text-search.html says that I can enable a custom AnalyzerFactory for GraphDB full-text search, using the luc:analyzer param, by implemeting the interface com.ontotext.trree.plugin.lucene.AnalyzerFactory. However I can't find this interface anywhere. It is not in the jar graphdb-free-runtime-8.1.0.jar.

I checked the feature matrix at http://ontotext.com/products/graphdb/editions/#feature-comparison-table and it seems this feature '"Connectors Lucene" is available for the free edition of GraphDB.

In which jar is the com.ontotext.trree.plugin.lucene.AnalyzerFactory interface located ? what do I need to import in my project to implement this interface ?

Is there pre-existing AnalyzerFactories included with GraphDB to use Lucene other analyzers ? (I am interested in using a FrenchAnalyzer).

Thanks !

You should use the Maven repository (http://graphdb.ontotext.com/documentation/enterprise/maven-artifacts.html) or am I'm wrong? — UninformedUser, Apr 20 '17 at 14:03
I am not sure this is it. http://maven.ontotext.com/content/groups/all-onto asks me for a login/password while the documentation says the artifacts should be available "without credentials". — ThomasFrancart, Apr 23 '17 at 06:54

score 1 · Accepted Answer · answered Apr 24 '17 at 07:41

GraphDB offers two different Lucene-based plugins.

Lucene FTS plugin indexes RDF molecules and the correct documentation link is: http://graphdb.ontotext.com/documentation/free/full-text-search.html
Lucene Connector performs online synchronization between the RDF and Lucene document models using sequences of configurations like ?subject propertyPath ?object to id|fild value. The correct documentation link is: http://graphdb.ontotext.com/documentation/free/lucene-graphdb-connector.html

I encourage you to use the Lucene Connector, unless you don't have a special case for RDF molecules. Here is a simple example how to configure the connector with French analyzer and index all values for rdfs:label predicate for resources of type urn:MyClass. Select a repository and from the SPARQL query view execute:

  PREFIX :<http://www.ontotext.com/connectors/lucene#>
  PREFIX inst:<http://www.ontotext.com/connectors/lucene/instance#>
  INSERT DATA {
    inst:labelFR-copy :createConnector '''
  {
    "fields": [
      {
        "indexed": true,
        "stored": true,
        "analyzed": true,
        "multivalued": true,
        "fieldName": "label",
        "propertyChain": [
          "http://www.w3.org/2000/01/rdf-schema#label"
        ],
        "facet": true
      }
    ],
    "types": [
      "urn:MyClass"
    ],
    "stripMarkup": false,
    "analyzer": "org.apache.lucene.analysis.fr.FrenchAnalyzer"
  }
  ''' .
  }

Then manually add some sample test data from Import > Text area:

<urn:instance:test>  <http://www.w3.org/2000/01/rdf-schema#label> "C'est une example".
<urn:instance:test> a <urn:MyClass>.

Once you commit the transaction, the Connector will update the Lucene index. Now you can run search queries like:

PREFIX : <http://www.ontotext.com/connectors/lucene#>
PREFIX inst: <http://www.ontotext.com/connectors/lucene/instance#>
SELECT ?entity ?snippetField ?snippetText {
    ?search a inst:labelFR ;
            :query "label:*" ;
            :entities ?entity .
    ?entity :snippets _:s .
    _:s :snippetField ?snippetField ;
        :snippetText ?snippetText .
}

To create a custom analyzer follow the instructions in the documentation and extend org.apache.lucene.analysis.Analyzer class. Put the custom analyzer JAR in lib/plugins/lucene-connector/ path.

How to create a custom AnalyzerFactory in GraphDB full text search?

1 Answers1

Linked