0

I had to insert a huge amount of data into elastic and I have done it in the following manner. I need to query this object but I am unable to filter the "logData" array. Can someone help me out here ? is it even possible to filter an array in elastic?

"_source":{
"FileName": "fileName.log"
"logData": [
                        {
                            "LineNumber": 1,
                            "Data": "data1"
                        },
                        {
                            "LineNumber": 2,
                            "Data": "Data2"
                        },
                        {
                            "LineNumber": 3,
                            "Data": "Data3"
                        },
                        {
                            "LineNumber": 4,
                            "Data": "Data4"
                        },
                        {
                            "LineNumber": 5,
                            "Data": "Data5"
                        },
                        {
                            "LineNumber": 6,
                            "Data": "Data6"
                        }
]}

Is there a way to query such that I get only few items from this array ? like:

"_source":{
"FileName": "fileName.log"
"logData": [
                        {
                            "LineNumber": 1,
                            "Data": "data1"
                        },
                        {
                            "LineNumber": 2,
                            "Data": "Data2"
                        },
                        {
                            "LineNumber": 3,
                            "Data": "Data3"
                        }
]
}
SSB
  • 1,572
  • 2
  • 10
  • 12

1 Answers1

1

There's no dedicated array mapping type in ES.

With that being said, when you have an array of objects with shared keys, it's recommended that you use the nested field type to preserve the connections of the individual sub-objects' attributes. If you don't use nested, the objects will be flattened which may lead to seemingly wrong query results.


As to the actual query -- assuming your mapping looks something like this:

PUT logs_index
{
  "mappings": {
    "properties": {
      "logData": {
        "type": "nested"
      }
    }
  }
}

you'll need to filter those logData sub-documents of interest, perhaps with a terms_query. Then and only then can you extract only those array objects that've matched this query (lineNumber: 1 or 2 or 3).

The technique for that is called inner_hits:

POST logs/_search
{
  "_source": ["FileName", "inner_hits.logData"],
  "query": {
    "nested": {
      "path": "logData",
      "query": {
        "terms": {
          "logData.LineNumber": [
            1,
            2,
            3
          ]
        }
      },
      "inner_hits": {}
    }
  }
}

Check this thread for more info.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Hay! yes, this works. but I have "FileName": "fileName.log" also which is also a part of that filter. would it work if I have two queries in place ? I mean I need to filter based on "FileName": "fileName.log" and also based on logData.LineNumber – SSB Jan 26 '21 at 11:16
  • 1
    No problem -- just use a [nested `must` query](https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-nested-query.html#nested-query-ex-query). – Joe - GMapsBook.com Jan 26 '21 at 11:40