I have indexed my database using Hibernate Search. I use a custom analyzer, both for indexing and for querying. I have a field called inchikey that should not get tokenized. Example values are:
- BBBAWACESCACAP-UHFFFAOYSA-N
- KEZLDSPIRVZOKZ-AUWJEWJLSA-N
When I look into my index with Luke I can confirm that they are not tokenized, as required.
However, when I try to search them using the web app, some inchikeys are found and others are not. Curiously, for these inchikeys the search DOES work when I search without the last hyphen, as so: BBBAWACESCACAP-UHFFFAOYSA N
I have not been able to find a common element in the inchikeys that are not found.
Any idea what is going on here?
I use a MultiFieldQueryParser to search over the different fields in the database:
String[] searchfields = Compound.getSearchfields();
MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_29, Compound.getSearchfields(), new ChemicalNameAnalyzer());
//Disable the following if search performance is too slow
parser.setAllowLeadingWildcard(true);
FullTextQuery fullTextQuery = fullTextSession.createFullTextQuery(parser.parse("searchterms"), Compound.class);
List<Compound> hits = fullTextQuery.list();
More details about our setup have been posted here by Tim and I.