Get pagination query documents in an aggregated bucket

Question

I am working on the GeoTile of Elastic search. After grouping the locations into buckets, I want to get the data in that bucket with pagination (using search after). Have anyone done on that, how can I achieve it? Thank you!

Here is the GeoTile aggregation I have used:

GET /index-name/_doc/_search
{
  "aggs": {
     "result": {
        "geotile_grid": {
          "field": "location",
          "precision": 12
        }
     }
   }
}

And the result look like:

{
  "took" : 3,
  "hits" : {
    "total" : {
      "value" : 39,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ... ]
  },
  "aggregations" : {
    "result" : {
      "buckets" : [
        {
          "key" : "12/3519/1597",
          "doc_count" : 36
        },
        {
          "key" : "12/3520/1597",
          "doc_count" : 3
        }
      ]
    }
  }
}

For example, how can I get 36 documents in the "12/3519/1597" bucket? Thank you!

I have already tried to convert between the GeoTile key "12/3519/1597" into a bounding box follow this article or used the GeoTileUtils from the ESearch code.

However, from the example above, the key "12/3519/1597" is converted to a bounding box, and when I query all the documents in that box, there were 2 buckets. The x=3520 bucket contains documents in the lon=129.375 which exactly lie on the right edge.

Nathan Reese · Accepted Answer · 2023-07-12T15:18:11.450

You could nest top hits aggregation to get documents per geo tile buckets.

You could also use geo grid query to filter documents per tile.

GET kibana_sample_data_logs/_search
{
  "size": 1,
  "query": {
    "bool": {
      "must": [],
      "filter": [
        {
          "geo_grid": {
            "geo.coordinates": {
              "geotile": "5/9/12"
            }
          }
        }
      ],
      "should": [],
      "must_not": []
    }
  }
}

Response

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 675,
      "relation": "eq"
    },
    "max_score": 0,
    "hits": [
      {
        "_index": ".ds-kibana_sample_data_logs-2023.07.12-000001",
        "_id": "NM-ISokB7DQkCI7yJZQ-",
        "_score": 0,
        "_source": {
          "agent": "Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1",
          "bytes": 8973,
          "clientip": "213.50.214.248",
          "extension": "rpm",
          "geo": {
            "srcdest": "US:VN",
            "src": "US",
            "dest": "VN",
            "coordinates": {
              "lat": 40.19349528,
              "lon": -76.76340361
            }
          },
          "host": "artifacts.elastic.co",
          "index": "kibana_sample_data_logs",
          "ip": "213.50.214.248",
          "machine": {
            "ram": 12884901888,
            "os": "win 8"
          },
          "memory": null,
          "message": "213.50.214.248 - - [2018-09-10T11:39:18.812Z] \"GET /beats/metricbeat/metricbeat-6.3.2-i686.rpm HTTP/1.1\" 200 8973 \"-\" \"Mozilla/5.0 (X11; Linux x86_64; rv:6.0a1) Gecko/20110421 Firefox/6.0a1\"",
          "phpmemory": null,
          "referer": "http://www.elastic-elastic-elastic.com/success/daniel-tani",
          "request": "/beats/metricbeat/metricbeat-6.3.2-i686.rpm",
          "response": 200,
          "tags": [
            "success",
            "info"
          ],
          "@timestamp": "2023-08-21T11:39:18.812Z",
          "url": "https://artifacts.elastic.co/downloads/beats/metricbeat/metricbeat-6.3.2-i686.rpm",
          "utc_time": "2023-08-21T11:39:18.812Z",
          "event": {
            "dataset": "sample_web_logs"
          },
          "bytes_gauge": 8973,
          "bytes_counter": 65621715
        }
      }
    ]
  }
}

The top-hit is not enough, because I have more than 10,000 records. That is why I asked to use `search_after`. — Phúc Đỗ Vương, Jul 07 '23 at 06:45
But how can I construct the bounding box? As described above, I have tried to use geo grid query but the bounding box is not correct. — Phúc Đỗ Vương, Jul 10 '23 at 09:22
With geo grid query, there is no need for bounding box. Just pass in the key of the tile, like '12/3519/1597' and elasticsearch will return all documents within the tile. — Nathan Reese, Jul 10 '23 at 13:05
Can you give me an example of passing the geotile key? As far as I know, we can just only pass the geohash, not geotile. — Phúc Đỗ Vương, Jul 11 '23 at 03:40
Thanks a lot @Nathan Reese, I have checked that. However, I cannot use it because my ES version is 7.10 :( This feature has just released in 8.8 — Phúc Đỗ Vương, Jul 13 '23 at 02:48

score 0 · Answer 2 · answered Jul 13 '23 at 03:16

For the newer ES version (since 8.8), you can use @Nathan Reese solution.

However, in the lower version (mine is 7.10), I have used GeoTileUtils of the Elastic search to convert from the geotile key (z/x/y) into the bounding box.

But you must aware of the edge of bounding box. The geotile aggregation does not take the location (point) on the right and bottom edge. To exclude the point on the edge, I used a painless script as follow:

GET /index-name/_doc/_search
{
  "size": 3,
  "query": {
    "bool": {
      "filter": [
        { 
          "geo_bounding_box": {
            "location": {
              "top_left": {
                "lat": 36.80928470205938, "lon": 129.287109375
                },
              "bottom_right": {
                "lat": 36.73888412439431, "lon": 129.37500
              }
            }
          }
        },
        {
          "script": {
            "script": {
              "source": "doc['location'].lon < params.maxLon && doc['location'].lat < params.minLat",
              "lang": "painless",
              "params": {
                "minLat": 36.80928470205938,
                "maxLon": 129.37500
              }
            }
          }
        }
      ]
    }
  }
}

Get pagination query documents in an aggregated bucket

2 Answers2