1

I'm having issues figuring out how to specify my own analyzer implementation inside of GraphDB. After reading through the documentation and a couple of other posts, I seem to be running into issues with .jar dependencies.

In order to build the boilerplate CustomAnalyzer and CustomAnalyzerFactory classes, I had to use the lucene.jar and lucene-core.jar located in lib/plugins/lucene. My gradle build file looks like this:

group 'com.example'
version '1.0-SNAPSHOT'

apply plugin: 'java'

sourceCompatibility = 1.8

repositories {
    mavenCentral()
}

dependencies {
    testCompile group: 'junit', name: 'junit', version: '4.12'
    compile fileTree(dir: 'libs/lucene', include: '*.jar')

}

Note: libs/lucene is the folder in my gradle project where I copied the lucene.jar and lucene-core.jar located in lib/plugins/lucene of the graphdb stand-alone server distribution

After I compile the code and create the jar file using gradle clean jar, I copy it into lib/plugins/lucene-connector.

I restart graph-db, go into connectors and attempt to add a lucene-connector using the UI. I manage to get all the way down to where you can specify your analyzer. However when I specify com.example.CustomAnalyzer, I get the following error message.

 Caused by: java.lang.NoClassDefFoundError: org/apache/lucene/analysis/ASCIIFoldingFilter

After some digging around, I've found that there are 2 lucene-core.jar files. One in libs/plugins/lucene and the other in libs/plugins/lucene-connector. The lucene-core.jar in libs/plugins/lucene-connector does not have the ASCIIFoldingFilter class.

I've even tried creating a fatJar w/ all the dependencies contained in a single jar, but when I do that graphdb fails to load any of the connectors.

Not really sure, where I'm going wrong, have a feeling its got something to do with how I'm building and referencing the jar files.


I also tried removing the ASCIIFilter from the CustomAnalyzer but get a whole new set of errors:

Caused by: com.ontotext.trree.sdk.BadRequestException: Unable to instantiate analyzer class, only analyzers with a default constructor or a constructor accepting single Version parameter are possible: com.example.CustomAnalyzer
    at com.ontotext.trree.plugin.externalsync.impl.lucene4.CreateAnalyzerUtil.instantiateAnalyzer(CreateAnalyzerUtil.java:70)
    at com.ontotext.trree.plugin.externalsync.impl.lucene4.CreateAnalyzerUtil.createAnalyzerFromClassName(CreateAnalyzerUtil.java:42)
    at com.ontotext.trree.plugin.externalsync.impl.lucene4.Lucene4ExternalStore.open(Lucene4ExternalStore.java:182)
    at com.ontotext.trree.plugin.externalsync.impl.lucene4.Lucene4ExternalStore.initImpl(Lucene4ExternalStore.java:718)
    ... 60 common frames omitted

1 Answers1

2

GraphDB offers two mechanisms for full text searching. The first option is GraphDB Lucene Connector plugin, which is the recommended approach for any new development. The other alternative is the GraphDB FTS plugin that is using a slightly different indexing approach. Its main limitation due the nature of the index is the lack of automatic synchronisation when the RDF data changes.

In your example you want to extend the Lucene Connector, but actually modify the binary of the FTS plugin. To simplify the instructions and all necessary steps to develop, test and deploy the custom analyser, I have prepared a public project to try:

https://gitlab.ontotext.com/vassil.momtchev/custom-lucene-analyzer

vassil_momtchev
  • 1,173
  • 5
  • 11
  • To run in Docker: ```RUN set -e; \ git clone https://gitlab.ontotext.com/vassil.momtchev/custom-lucene-analyzer.git; \ cd custom-lucene-analyzer; \ sed -i "s/.*<\/lucene.version>/7.7.0<\/lucene.version>/g" pom.xml; \ apt update; \ apt install -y maven; ADD CustomAnalyzer.java src/main/java/com/ontotext/graphdb/lucene/CustomAnalyzer.java RUN set -e; \ cd custom-lucene-analyzer; \ mvn install -DskipTests; \ cp target/custom-lucene-analyzer-1.0-SNAPSHOT.jar /opt/graphdb/dist/lib/plugins/lucene-connector;``` – Iddan Aaronsohn Jun 03 '19 at 15:36