Is there any way to create pagination over nested objects in Elastisearch?
I the example data bellow I search offices (nested objects) and need to load only nested matches for requested pagination page.
For example, for page 3
I need to load 4 items from offset 8 (office.3D, office.3E, office.4A, office.4B) + total count of matching nested items.
|-company.1
| |-name
| |-address
| |-offices __ __
| |-office.1A |
|-company.2 |
| |-name |
| |-address | page 1, size=4, offset=0
| |-offices |
| |-office.2A |
| |-office.2B |
| |-office.2C __|__
|-company.3 |
| |-name |
| |-address |
| |-offices | page 2, size=4, offset=4
| |-office.3A |
| |-office.3B |
| |-office.3C |
| |-office.3D __|__
| |-office.3E |
|-company.4 |
| |-name |
| |-address | page 3, size=4, offset=8
| |-offices |
| |-office.4A |
| |-office.4B |
| |-office.4C __|__
|-company.5
...
Mapping:
{
"mappings": {
"properties": {
"name": { "type" : "keyword" }
"address": { "type" : "keyword" }
"offices": {
"type": "nested",
"properties": {
"hash": { "type": "keyword" },
"street": { "type": "keyword" },
"city": { "type": "keyword" },
"zip": { "type": "keyword" },
}
}
}
}
}
The problem is, that loading page 2
(offset=4) should load root document company.3
in the example above.
But Elasticsearch does not seem to support offset for nested fields, only for parent documents, or within nested objects for inner_hits
.
I tried many queries and aggregates, but currently the only working solution is loading all matching nested objects along with root/parent document and loop over all of them until I reach offset for nested fields 0... nested offset
.
This is however very ineffective and even with allowing only maximum 500 pages (20 items per page size, means 10.000 default ES limit) I still may hit memory limits.
Database has up to 2 mil. records of root documents, and 600k nested office records.
Currently I use following query - example:
{
query: {
bool: {
must: {
0: {
nested: {
path: office
inner_hits: { size: 20 }
query: {
bool: {
must: {
0: {
wildcard: {
office.city: Bratislava*
}
}}}}}}}}}
aggregations: {
cnt-total: {
nested: {
path: office
aggs: {
cnt-matching: {
filter: {
bool: {
must: {
0: {
wildcard: {
office.city: Bratislava*
}
}}}}}}}}}
}
This will give me correct total number of matching nested documents. But then I must loop over all loaded parent documents and calculate nested offset via script. Is there more effective way to do this?