3

i have some data from one provider - very big structured JSON data:

  "mappings": {
    "properties": {
      "field_a": { .. },
      "field_b": { .. },
      "field_c": { .. },
      "field_d": {
        "properties": {
          "subfield_a": {...},
          "subfield_b": {...},
          "subfield_c": {...},
          "subfield_d": {...},
          "subfield_e": {
            "properties": {
              "myfield": {
                "type": "keyword"
              },
              "another_a": {...},
              "another_b": {...},
            }
          }
        }
      }
    }
  }

subfield_e is array of objects contains many fields with my interest "myfield".

I need aggregation with only fields "myfield" what contain some string.

So, i now do this with wrong (but logic result):

GET /index/_search
{
  "query": {
    "wildcard": {
      "field_d.subfield_e.myfield": "*string*"
    }
  },
    "aggs": {
      "interest": {
        "terms": {
          "field": "field_d.subfield_e.myfield",
          "size": 10
        }
      }
    },
    "size": 0
}

The problem of this query is, that query will choose all documents where array of objects "esubfield_e" contain object myfield with string and under these all documents made aggregation. So, finally i get results with all "myfields" under these documents and not only myfields containing string.

I was try make a bucket_selector aggregation after my main aggregation, but i got error: "buckets_path must reference either a number value or a single value numeric metric aggregation, got: [String] at aggregation [_key]"

My code is inspired by: Filter Elasticsearch Aggregation by Bucket Key Value and looks now:

GET /index/_search
{
  "query": {
    "wildcard": {
      "field_d.subfield_e.myfield": "*string*"
    }
  },
    "aggs": {
      "interest": {
        "terms": {
          "field": "field_d.subfield_e.myfield",
          "size": 10
        }
      },
      "aggs": {
        "buckets": {
          "bucket_selector": {
            "buckets_path": {
              "key": "_key"
            },
            "script": "params.key.contains('string')"
          }
        }
      }
    }
  },
  "size": 0
}

So, how i can filter a aggregations buckets (term aggs) by their string key ?

Gransy
  • 111
  • 1
  • 3
  • I've just posted [an answer to the original question](https://stackoverflow.com/a/66695378/8160318). Let me know if you'd like me to adapt it to your particular use case! – Joe - GMapsBook.com Mar 18 '21 at 16:44
  • Thank you, i'm trying it, but its looks as performance killer. I also reimport now data to another index, where i try specify type="nested" for subfield_e. If i use a type nested, i can use perfectly nested + filter aggregation and its works as i required. But question is performace, so i will let know after import. – Gransy Mar 18 '21 at 17:46
  • Right, OK. Check the link at the bottom of that answer too -- it deals with nested fields and partial matches -- it may be relevant for ya. In any event, let me know how it goes! – Joe - GMapsBook.com Mar 18 '21 at 17:51

1 Answers1

0

I solved it by switching subfield_e to nested object instead of undefined array and I reimported all data to this new mapping.

Current mapping looks as:

  "mappings": {
    "properties": {
      "field_a": { .. },
      "field_b": { .. },
      "field_c": { .. },
      "field_d": {
        "properties": {
          "subfield_a": {...},
          "subfield_b": {...},
          "subfield_c": {...},
          "subfield_d": {...},
          "subfield_e": {
            "type": "nested"    <======= This line added
            "properties": {
              "myfield": {
                "type": "keyword"
              },
              "another_a": {...},
              "another_b": {...},
            }
          }
        }
      }
    }
  }

And final working query is:

GET /index/_search
{
  "query": {
    "nested": {
      "path": "field_d.subfield_e",
      "query": {
        "wildcard": {
          "field_d.subfield_e.myfield": {
            "value": "*string*"
          }
        }
      }
    }
  },
  "aggs": {
    "agg": {
      "nested": {
        "path": "field_d.subfield_e"
      },
      "aggs": {
        "inner": {
          "filter": {
            "wildcard": {
              "field_d.subfield_e.myfield": "*string*"
            }
          }, "aggs": {
            "interest": {
              "terms": {
                "field": "field_d.subfield_e.myfield",
                "size": 10
              }
            }
          }
        }
      }
    }
  },
  "size": 0
}

The speed of this query is in my case much more better than using include/exclude in terms aggregation.

Gransy
  • 111
  • 1
  • 3