1

Elasticsearch 7.7 and I'm using the official php client to interact with the server.

My issue was somewhat solved here: https://discuss.elastic.co/t/need-to-return-part-of-a-doc-from-a-search-query-filter-is-parent-child-the-way-to-go/64514/2

However "Types are deprecated in APIs in 7.0+" https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html

Here is my document:

{
  "offering_id": "1190",
  "account_id": "362353",
  "service_id": "20087",
  "title": "Quick Brown Mammal",
  "slug": "Quick Brown Fox",
  "summary": "Quick Brown Fox"
  "header_thumb_path": "uploads/test/test.png",
  "duration": "30",
  "alter_ids": [
    "59151",
    "58796",
    "58613",
    "54286",
    "51812",
    "50052",
    "48387",
    "37927",
    "36685",
    "36554",
    "28807",
    "23154",
    "22356",
    "21480",
    "220",
    "1201",
    "1192"
  ],
  "premium": "f",
  "featured": "f",
  "events": [
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "boo"
    },
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "xyz"
    },
    {
      "event_id": "9999",
      "start_date": "2020-08-11 11:30:00",
      "registration_count": "41",
      "description": "test"
    }
  ]
}

Notice how the object may have one or many "events"

Searching based on event data is the most common use case.

For example:

  • Find events that start before 12pm
  • Find events with a description of "xyz"
  • List find events with a start date in the next 10 days.

I would like to NOT return any events that didn't match the query!


So, for example Find events with a description of "xyz" for a given service

{
  "query": {
    "bool": {
      "must": {
        "match": {
          "events.description": "xyz"
        }
      },
      "filter": {
        "bool": {
          "must": [
            {
              "term": {
                "service_id": 20087
              }
            }
          ]
        }
      }
    }
  }
}

I would want the result to look like this:

{
  "offering_id": "1190",
  "account_id": "362353",
  "service_id": "20087",
  "title": "Quick Brown Mammal",
  "slug": "Quick Brown Fox",
  "summary": "Quick Brown Fox"
  "header_thumb_path": "uploads/test/test.png",
  "duration": "30",
  "alter_ids": [
    "59151",
    "58796",
    "58613",
    "54286",
    "51812",
    "50052",
    "48387",
    "37927",
    "36685",
    "36554",
    "28807",
    "23154",
    "22356",
    "21480",
    "220",
    "1201",
    "1192"
  ],
  "premium": "f",
  "featured": "f",
  "events": [
    {
      "event_id": "9999",
      "start_date": "2020-07-01 14:00:00",
      "registration_count": "22",
      "description": "xyz"
    }
  ]
}

However, instead it just returns the ENTIRE document, with all events.

Is it even possible to return only a subset of the data? Maybe with Aggregations?

  • Right now, we're doing an "extra" set of filtering on the result set in the application (php in this case) to strip out event blocks that don't match the desired results.
  • It would be nice to just have elastic give directly what's needed instead of doing extra processing on the result to pull out the applicable event.
  • Thought about restructuring the data to instead have it based around "events" but then I would be duplicating data since every offering will have the parent data too.

This used to be in SQL, where there was a relation instead of having the data nested like this.

emmdee
  • 1,541
  • 3
  • 25
  • 46
  • if this is your main use case and return extra nested docs is a problem (from your example it is not clear - having a few extra documents is not an issue) then you can consider flattening the documents - store event fields at top-level and produce one ES document per event. – khachik Jun 23 '20 at 19:37
  • Can you add the query which is giving you entire data instead one. – Gibbs Jun 23 '20 at 19:37
  • @Gibbs I've added an example of a search query. This is pretty much the "most simple" of a search this app would do. I've tried it in reverse (match the service, then filter the event, etc) they all just give me the whole object. – emmdee Jun 23 '20 at 20:22
  • @khachik Yep I think you're right! Unfortunately a future use-case will be a very large amount of data, and not nesting it will really bring up infrastructure costs. For this specific "event" scenario I think that's what we'll do but I must have the ability to efficiently work with nested data for future use cases in elasticsearch so I would really like to know the tactic to pull back only certain nested objects like with 6.x and below. – emmdee Jun 23 '20 at 20:24

1 Answers1

1

A subset of the nested data can be returned using Nested Aggregations along with Filter Aggregations

To know more about these aggregations refer these official documentation :

Filter Aggregation

Nested Aggregation

Index Mapping:

{
  "mappings": {
    "properties": {
      "offering_id": {
        "type": "integer"
      },
      "account_id": {
        "type": "integer"
      },
      "service_id": {
        "type": "integer"
      },
      "title": {
        "type": "text"
      },
      "slug": {
        "type": "text"
      },
      "summary": {
        "type": "text"
      },
      "header_thumb_path": {
        "type": "keyword"
      },
      "duration": {
        "type": "integer"
      },
      "alter_ids": {
        "type": "integer"
      },
      "premium": {
        "type": "text"
      },
      "featured": {
        "type": "text"
      },
      "events": {
        "type": "nested",
        "properties": {
          "event_id": {
            "type": "integer"
          },
          "registration_count": {
            "type": "integer"
          },
          "description": {
            "type": "text"
          }
        }
      }
    }
  }
}

Search Query :

{
  "size": 0,
  "aggs": {
    "nested": {
      "nested": {
        "path": "events"
      },
      "aggs": {
        "filter": {
          "filter": {
            "match": { "events.description": "xyz" }
          },
          "aggs": {
            "total": {
              "top_hits": {
                "size": 10
              }
            }
          }
        }
      }
    }
  }
}

Search Result :

"hits": [
          {
            "_index": "foo21",
            "_type": "_doc",
            "_id": "1",
            "_nested": {
              "field": "events",
              "offset": 1
            },
            "_score": 1.0,
            "_source": {
              "event_id": "9999",
              "start_date": "2020-07-01 14:00:00",
              "registration_count": "22",
              "description": "xyz"
            }
          }
        ]

Second Method :

{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "service_id": "20087"
          }
        },
        {
          "nested": {
            "path": "events",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "events.description": "xyz"
                    }
                  }
                ]
              }
            },
            "inner_hits": {
              
            }
          }
        }
      ]
    }
  }
}

You can even go through this SO answer:

How to filter nested aggregation bucket?

Returning a partial nested document in ElasticSearch

ESCoder
  • 15,431
  • 2
  • 19
  • 42
  • @emmdee did you get a chance to go through my answer, looking forward to get feedback from you – ESCoder Jun 24 '20 at 04:16
  • thanks! So these nested aggregations are not the same as the deprecated nested types? https://www.elastic.co/guide/en/elasticsearch/reference/7.x/removal-of-types.html – emmdee Jun 24 '20 at 04:36
  • @emmdee the above query runs perfectly with Elasticseach version 7.8. And you can still use Nested datatype Refer this https://www.elastic.co/guide/en/elasticsearch/reference/current/nested.html – ESCoder Jun 24 '20 at 04:46
  • @emmdee And for Filter and nested aggregation, you can refer to the links that I have mentioned in the answer. Please go through the answer, and let me know if your issue is resolved? – ESCoder Jun 24 '20 at 04:48
  • The problem was I ran into a big scary DEPRECATED message here: https://github.com/elastic/elasticsearch-php/blob/ce0384017de6e6cf80f7b3cf6b4c210445150ea0/src/Elasticsearch/Namespaces/IndicesNamespace.php#L681 and I went down the deprecation rabbit hole. I set the mapping type using non-deprecated style and nested are working as expected. – emmdee Jun 24 '20 at 06:05
  • @emmdee glad to hear that ur issue is resolved :) Thank u for accepting and upvoting the answer :) – ESCoder Jun 24 '20 at 06:13