Retrieve data from Elasticsearch using aggregations where the values contains hyphen

Question

I am working on elastic search for quite some time now... I have been facing a problem recently.

I want to group by a particular column in elastic search index. The values for that particular column has hyphens and other special characters.

SearchResponse res1 = client.prepareSearch("my_index")
            .setTypes("data")
            .setSearchType(SearchType.QUERY_AND_FETCH)
            .setQuery(QueryBuilders.rangeQuery("timestamp").gte(from).lte(to))
            .addAggregation(AggregationBuilders.terms("cat_agg").field("category").size(10))
            .setSize(0)
            .execute()
            .actionGet();

    Terms termAgg=res1.getAggregations().get("cat_agg");
    
    for(Bucket item :termAgg.getBuckets()) {    
        cat_number =item.getKey();
        System.out.println(cat_number+"  "+item.getDocCount());
        }

This is the query I have written inorder to get the data groupby "category" column in "my_index".

The output I expected after running the code is:

category-1  10

category-2  9

category-3  7

But the output I am getting is :

category   10

1  10

category   9

2  9

category   7

3  7

I have already went through some questions like this one, but couldn't solve my issue with these answers.

score 1 · Answer 1 · answered Dec 17 '15 at 14:28

When you index "category-1" you will get (by default) two terms, "category", and "1". Therefore when you aggregate you will get back two results for that.

If you want it to be considered a single "term" then you need to change the analyzer used on that field when indexing. Set it to use the keyword analyzer

score 1 · Accepted Answer · answered Dec 17 '15 at 14:29

That's because your category field has a default string mapping and it is analyzed, hence category-1 gets tokenized as two tokens namely category and 1, which explains the results you're getting.

In order to prevent this, you can update your mapping to include a sub-field category.raw which is going to be not_analyzed with the following command:

curl -XPUT localhost:9200/my_index/data/_mapping -d '{
    "properties": {
        "category": {
            "type": "string",
            "fields": {
                "raw": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}'

After that, you need to re-index your data and your aggregation will work and return you what you expect. Just make sure to change the following line in your Java code:

.addAggregation(AggregationBuilders.terms("cat_agg").field("category.raw").size(10))
                                                                      ^
                                                                      |
                                                                add .raw here

Thanks val for the solution..I have a small prob.Right now i cant update my schema for some reasons.So are there any other solutions from program point of view? — Sudhir kumar, Dec 17 '15 at 16:11
You can create a brand new index + mapping and index your data in there. — Val, Dec 17 '15 at 16:13

Retrieve data from Elasticsearch using aggregations where the values contains hyphen

2 Answers2