0

I have stored some values in Elasticsearch nested data type (an array) but without using key/value pair. An example record would be:

{
  "categories": [
  "Category1",
  "Category2"
  ],
  "product_name": "productx"
}

Now I want to run aggregation query to find out unique list of categories available. But all the examples I've seen pointed to mapping that has key/value. Is there any way I can use above schema as is or do I need to change my schema to something like this to run aggregation query

{
  "categories": [
     {"name": "Category1"},
     {"name": "Category2"}
  ],
  "product_name": "productx"
}

1 Answers1

2

Well regarding JSON structure, you need to take a step back and figure out if you'd want list or key-value pairs.

Looking at your example, I don't think you need key-value pairs but again its something you may want to clarify by understanding your domain if there'd be some more properties for categories.

Regarding aggregation, as far as I know, aggregations would work on any valid JSON structure.

For the data you've mentioned, you can make use of the below aggregation query. Also I'm assuming the fields are of type keyword.

Aggregation Query

POST <your_index_name>/_search
{
  "size": 0,
  "aggs": {
    "myaggs": {
      "terms": {
        "size": 100,
        "script": {
          "inline": """
            def myString = "";
            def list = new ArrayList();
            for(int i=0; i<doc['categories'].length; i++){
              myString = doc['categories'][i] + ", " + doc['product'].value;
              list.add(myString);
            }
            return list;
            """
        }
      }
    }
  } 
}

Aggregation Response

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "myaggs": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "category1, productx",
          "doc_count": 1
        },
        {
          "key": "category2, productx",
          "doc_count": 1
        }
      ]
    }
  }
}

Hope it helps!

Kamal Kunjapur
  • 8,547
  • 2
  • 22
  • 32
  • 1
    I read more about array data type. If I use nested type, I cannot add array type field in to nested field and my example wouldn't work. In my original mapping (since I've created categories field dynamically) it was set to a text field and keyword is the first element of the array so aggression wouldn't work. I guess I will settle with key/val pair and nested data type. – Sameera Godakanda Nov 16 '18 at 08:56
  • The aggregation I've provided works if you replace `categories` with `categories.keyword`. However yes, I'd suggest you to play around with `nested` and `non-nested` fields, queries to understand how they work and try to see how they'd match your requirements and use-cases and make changes accordingly. Also note that for nested type, you would required to make use of `nested aggregations`. – Kamal Kunjapur Nov 16 '18 at 09:19
  • 1
    Yes. I used the keyword before. however your query above missing size: parameter inside "terms": { so it was only showing 10 results. Apparently you cannot use size:0 and had to use some large value there. – Sameera Godakanda Nov 16 '18 at 11:18
  • That's good observation. Yes. Aggregation for terms only shows top 10 buckets. You need to add size manually to get larger list. Thanks for bringing this up, I've updated my answer accordingly. Added 100 to the size. :| – Kamal Kunjapur Nov 16 '18 at 11:41