3

I attempted upgrade from Hibernate Search 5.8.0.CR1 to 5.8.2.Final and from ElasticSearch 2.4.2 to 5.6.4.

When I run my application I'm getting the following error:

Status: 400 Bad Request
Error message: {"root_cause":[{"type":"illegal_argument_exception",
reason":"Fielddata is disabled on text fields by default.
Set fielddata=true on [title] in order to load fielddata in memory by uninverting the inverted index.
Note that this can however use significant memory. Alternatively use a keyword field instead."}]

I read about Fielddata here: https://www.elastic.co/guide/en/elasticsearch/reference/5.6/fielddata.html#_fielddata_is_disabled_on_literal_text_literal_fields_by_default But I'm not sure how to address this issue, especially from Hibernate Search.

My title field definition looks like this:

@Field(name = "title", analyzer = @Analyzer(definition = "my_collation_analyzer"))
@Field(name = "title_polish", analyzer = @Analyzer(definition = "polish"))
protected String title;

I'm using the following analyzer definition:

@AnalyzerDef(name = "my_collation_analyzer",
    tokenizer = @TokenizerDef(factory = KeywordTokenizerFactory.class), filters = { @TokenFilterDef(
        name = "polish_collation", factory = ElasticsearchTokenFilterFactory.class, params = {
                @org.hibernate.search.annotations.Parameter(name = "type", value = "'icu_collation'"),
                @org.hibernate.search.annotations.Parameter(name = "language", value = "'pl'") }) })

(Analyzer polish comes from plugin analysis-stempel.)

Elasticsearch notes on Fielddata recommend changing the type of the field from text to keyword, or setting fielddata=true, but I'm not sure how to do it using Hibernate Search annotations because there are no such properties in annotation @Field.

Update:

Thank you very much for the help on this. I changed my code to this:

@NormalizerDef(name = "my_collation_normalizer",
        filters = { @TokenFilterDef(
                name = "polish_collation_normalization", factory = ElasticsearchTokenFilterFactory.class, params = {
                        @org.hibernate.search.annotations.Parameter(name = "type", value = "'icu_collation'"),
                        @org.hibernate.search.annotations.Parameter(name = "language", value = "'pl'") }) })
... 

@Field(name = "title_for_search", analyzer = @Analyzer(definition = "polish"))
@Field(name = "title_for_sort", normalizer = @Normalizer(definition = "my_collation_normalizer"))
@SortableField(forField = "title_for_sort")
protected String title;

Is it ok? As I understand there should be no tokenization in a normalizer, but I'm not sure what else to use instead of @TokenFilterDef and factory = ElasticsearchTokenFilterFactory.class (?).

Unfortunately I'm also getting the following error:

Error message: {"root_cause":
[{"type":"illegal_argument_exception",
"reason":"Custom normalizer [my_collation_normalizer] may not use filter
[polish_collation_normalization]"}]

I need collation for sorting, as described in my previous question here: ElasticSearch - define custom letter order for sorting

Update 2:

I tested ElasticSearch version 5.6.5 and I think it allows icu_collation in normalizers (my annotations were accepted).

nuoritoveri
  • 2,494
  • 1
  • 24
  • 28

2 Answers2

3

If you are trying to sort on the "title" field, then maybe you forgot to mark the field as sortable using the @SortableField annotation. (More information here) [EDIT: In Hibernate Search 6 you would use @KeywordField(sortable = Sortable.YES). See here]

Also, to avoid errors and for better performance, you should consider using normalizers instead of analyzers for fields you want to sort on (such as your "title" field). This will turn your field into a keyword field, which is what the Elasticsearch logs are hinting at.

More information on normalizers in Hibernate Search is available here, and here are the Elasticsearch specifics in Hibernate Search.

yrodiere
  • 9,280
  • 1
  • 13
  • 35
  • Thank you very much for your answer. I updated the code in my question according to your hints, but I'm getting another error. Could you give it a look? – nuoritoveri Dec 15 '17 at 11:09
  • It seems the Elasticsearch team decided to not allow using this filter in normalizers for some unknown reason, and they added a field type instead. See https://github.com/elastic/elasticsearch/issues/26729#issuecomment-331020306 . Unfortunately this means that you currently cannot use ICU collation at all when using Hibernate Search over Elasticsearch... Your only options currently are using @SortableField + an Analyzer on "title_for_sort" (performance may be poor), or using a more crude normalization algorithm (ascii folding filter + lower case filter for instance) – yrodiere Dec 15 '17 at 12:22
  • Thank you very much for your help on this. I tested my code with ES 5.6.5 and it allowed icu_collation in normalizer. – nuoritoveri Jan 17 '18 at 15:39
1

You most likely kept the old schema in your Elasticsearch cluster and tried to use it in Elasticsearch 5 with Hibernate Search. This will not work.

When upgrading from Elasticsearch 2 to 5, you must take some steps to upgrade the Elasticsearch schema, in order to use it with Hibernate Search. The easiest option (by far) is to delete the indexes and reindex your whole database. You can find details in the documentation: https://docs.jboss.org/hibernate/stable/search/reference/en-US/html_single/#_upgrading_elasticsearch

Note that you may also have to delete indexes and reindex if your Elasticsearch schema was generated from a Beta version of Hibernate Search: Beta versions are unstable, and may generate an incorrect schema. They are nice for experiments, but definitely not for production environments.

yrodiere
  • 9,280
  • 1
  • 13
  • 35
  • I'm sorry I didn't mention it in the question: my test application is configured in such a way that each time it is run it removes existing data (in this case there were none, the folder configured in ES5 `elasticsearch.yml` was empty before ES5 was run) and then all objects added in the code are indexed via Hibernate Search. – nuoritoveri Dec 11 '17 at 12:38
  • I suppose your Elasticsearch cluster might still have the old *schema* though? Even if you deleted the index content. – Sanne Dec 11 '17 at 13:15
  • @Sanna if schema is stored in `data` folder then not, because the `data` folder was empty before I run ES5. – nuoritoveri Dec 11 '17 at 13:21