2

My ElasticSearch 6.5.2 index look likes:

      {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cCYuHW4BvwH6Y3jL87ul",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "cSYuHW4BvwH6Y3jL_Lvt",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "telecom",
    }
  },
  {
    "_index" : "searches",
    "_type" : "searches",
    "_id" : "eCb6O24BvwH6Y3jLP7tM",
    "_score" : 1.0,
    "_source" : {
      "querySearched" : "industry",
    }

And I would like a query that return this result:

"result": 
{
"querySearched" : "telecom",
"number" : 2
},
{
"querySearched" : "industry",
"number" : 1
}

I just want to group by occurence and get number of each, limit to ten biggest numbers. I tried with aggregations but bucket is empty. Thanks!

L01C
  • 578
  • 1
  • 8
  • 25

2 Answers2

7

Case your mapping

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fielddata": true
        }
      }
    }
  }
}

Your query should looks like

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched",
        "size": 10
      }
    }
  }
}

You should add fielddata:true in order to enable aggregation for text type field more of that

    "size": 10, => limit to 10
    

After a short discussion with @Kamal i feel obligated to let you know that if you choose to enable fielddata:true you must know that it can consume a lot of heap space.

From the link I've shared:

Fielddata can consume a lot of heap space, especially when loading high cardinality text fields. Once fielddata has been loaded into the heap, it remains there for the lifetime of the segment. Also, loading fielddata is an expensive process which can cause users to experience latency hits. This is why fielddata is disabled by default.

Another alternative (a more efficient one):

PUT /index
{
  "mappings": {
    "doc": {
      "properties": {
        "querySearched": {
          "type": "text",
          "fields": {
           "keyword": {
             "type": "keyword",
             "ignore_above": 256
           }
         }
        }
      }
    }
  }
}

Then your aggregation query

GET index/_search
{
  "size": 0,
  "aggs": {
    "result": {
      "terms": {
        "field": "querySearched.keyword",
        "size": 10
      }
    }
  }
}

Both solutions works but you should take this under consideration.

Hope it helps

Community
  • 1
  • 1
Assael Azran
  • 2,863
  • 1
  • 9
  • 13
  • Thanks a lot it works but the order doesn't work. About fieldata I already did it ;-). – L01C Nov 06 '19 at 16:26
  • I edited your post to delete the order by and it order by doc_count per default ;-). – L01C Nov 06 '19 at 16:40
  • Sure. it was just to show you how to order by aggs results. – Assael Azran Nov 06 '19 at 17:11
  • @Azran, there is a reason why `keyword` has been introduced. While technically it may be right in using, I would strongly recommend that you'd want to let people know of `keyword` instead of having `fielddata:true` as it can cause heap issues when index size grows and it can lead to significant performance impact. https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html#fielddata-disabled-text-fields. Infact the very link that you referred in your answer mentioned it. – Kamal Kunjapur Nov 06 '19 at 17:12
  • In general you are right but it is not black or white. It is depends on the results you want to get when using aggregations. fileddata:true was made for a reason. – Assael Azran Nov 06 '19 at 17:51
  • @Assael Azran Its all black when it comes to memory issues which users could avoid in longer run. Won't you agree? My point is don't you think it would be an inefficient solution in longer run & would make them revisit their model design itself and then change everything to `keyword` eventually (usually first thing would be to disable fielddata). Even Elastic team members/developers do not suggest using fielddata and instead recommend keyword. And you can get the results you want from keyword, you just need to model in right way :). Plus you'd get more upvotes :) – Kamal Kunjapur Nov 06 '19 at 20:32
  • 1
    @Kamal i get your point and i improved my answer to let future viewers know that `fielddata:true` might not be the best solution. – Assael Azran Nov 07 '19 at 07:58
  • 1
    I did it the good way without fielddata true ;-). Thanks everybody. – L01C Nov 07 '19 at 15:08
0

What did you tried?

POST /searches/_search

   {
      "size": 0,
      "aggs": {
        "byquerySearched": {
          "terms": {
            "field": "querySearched",
             "size": 10
          }
        }
      }
    }
LeBigCat
  • 1,737
  • 1
  • 11
  • 16