ElasticSearch Accessing Nested Documents in Script - Null Pointer Exception

Question

Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception

I have a mapping as such (simplified and obfuscated)

{
  "video_entry" : {
    "aliases" : { },
    "mappings" : {
      "properties" : {
       
        "captions_added" : {
          "type" : "boolean"
        },
        "category" : {
          "type" : "keyword"
        },
           
        "is_votable" : {
          "type" : "boolean"
        },
      
        "members" : {
          "type" : "nested",
          "properties" : {
            "country" : {
              "type" : "keyword",
            },
            "date_of_birth" : {
              "type" : "date",
            }
        }
   }
}

Each video_entry document can have 0 or more members nested documents.

Sample Document

{
   "captions_added": true,
   "category"      : "Mental Health",
   "is_votable:    : true,
   "members": [
        {"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
        {"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
   ]

}

If one or more nested document exist, we want to write some painless scripts that'd check certain fields across all the nested documents. My script works on mappings with a few documents but when I try it on larger set of documents I get null pointer exceptions despite having every null check possible. I've tried various access patterns, error checking mechanisms but I get exceptions.

POST /video_entry/_search
{
  "query": {
   "script": {
     "script": {
       "source": """
          // various NULL checks that I already tried
          // also tried short circuiting on finding null values
          if (!params['_source'].empty && params['_source'].containsKey('members')) {


              def total = 0;
          
          
              for (item in params._source.members) {
                // custom logic here
                // if above logic holds true 
                // total += 1; 
              } 
          
              return total > 3;
         }
         
         return true;
          
       """,
       "lang": "painless"
     }
   }
  }
}

Other Statements That I've Tried

if (params._source == null) {
    return true;
}

if (params._source.members == null) {
    return true;
}

if (!ctx._source.contains('members')) {
    return true;
}

if (!params['_source'].empty && params['_source'].containsKey('members') && 
     params['_source'].members.value != null) {
    
    // logic here

}

if (doc.containsKey('members')) {
  for (mem in params._source.members) {
  }

}

Error Message

&& params._source.members",
                 ^---- HERE"

 "caused_by" : {
            "type" : "null_pointer_exception",
            "reason" : null
          }

I've looked into changing the structure (flattening the document) and the usage of must_not as indicated in this answer. They don't suit our use case as we need to incorporate some more custom logic.

Different tutorials use ctx, doc and some use params. To add to the confusion Debug.explain(doc.members), Debug.explain(params._source.members) return empty responses and I'm having a hard time figuring out the types.

Gist: Trying to write a custom filter on nested documents using painless. Want to write error checks when there are no nested documents to surpass null_pointer_exception

Any help is appreciated.

score 2 · Accepted Answer · answered Dec 07 '21 at 10:49

TLDr;

Elastic flatten objects. Such that

{
  "group" : "fans",
  "user" : [ 
    {
      "first" : "John",
      "last" :  "Smith"
    },
    {
      "first" : "Alice",
      "last" :  "White"
    }
  ]
}

Turn into:

{
  "group" :        "fans",
  "user.first" : [ "alice", "john" ],
  "user.last" :  [ "smith", "white" ]
}

To access members inner value you need to reference it using doc['members.<field>'] as members will not exist on its own.

Details

As you may know, Elastic handles inner documents in its own way. [doc]

So you will need to reference them accordingly.

Here is what I did to make it work. Btw, I have been using the Dev tools of kibana

PUT /so_test/

PUT /so_test/_mapping
{
  "properties" : {
    "captions_added" : {
      "type" : "boolean"
    },
    "category" : {
      "type" : "keyword"
    },
    "is_votable" : {
      "type" : "boolean"
    },
    "members" : {
      "properties" : {
        "country" : {
          "type" : "keyword"
        },
        "date_of_birth" : {
          "type" : "date"
        }
      }
    }
  }
}

POST /so_test/_doc/
{
   "captions_added": true,
   "category"      : "Mental Health",
   "is_votable"    : true,
   "members": [
        {"country": "Denmark", "date_of_birth": "1998-04-04T00:00:00"},
        {"country": "Denmark", "date_of_birth": "1999-05-05T00:00:00"}
   ]
}

PUT /so_test/_doc/
{
   "captions_added": true,
   "category"      : "Mental breakdown",
   "is_votable"    : true,
   "members": []
}

POST /so_test/_doc/
{
   "captions_added": true,
   "category"      : "Mental success",
   "is_votable"    : true,
   "members": [
        {"country": "France", "date_of_birth": "1998-04-04T00:00:00"},
        {"country": "Japan", "date_of_birth": "1999-05-05T00:00:00"}
   ]
}

And then I did this query (it is only a bool filter, but I guess making it work for your own use case should not prove too difficult)

GET /so_test/_search
{
  "query":{
    "bool": {
      "filter": {
        "script": {
          "script": {
            "lang": "painless",
            "source": """
            def flag = false;
            
            // /!\ notice how the field is referenced /!\
            if(doc['members.country'].size() != 0)
            {
              for (item in doc['members.country']) {
                if (item == params.country){
                  flag = true
                }
              } 
            }
            return flag;
            """,
            "params": {
              "country": "Japan"
            }
          }
        }
      }
    }
  }
}

BTW you were saying you were a bit confused about the context for painless. you can find in the documentation so details about it. [doc]

In this case the filter context is the one we want to look at.

Thanks a ton. Really appreciate it. Gave it a whirl yesterday and it works. Will accept your answer in a bit :) — Abhirath Mahipal, Dec 08 '21 at 00:27
One issue I'm facing is - Your example doesn't make `members` as nested. Running `GET /so_test` results in "members" : { "properties" : { "country" : { "type" : "keyword" }, "date_of_birth" : { "type" : "date" } } } which indicates it isn't a Nested Object. — Abhirath Mahipal, Dec 09 '21 at 03:43
Your answer works with the object fields I have. Many thanks for that though :) — Abhirath Mahipal, Dec 09 '21 at 03:48
The mapping of a document, does reflect how the documents is ingested. But does not reflect how the document is indexed internally. I am myself not super clear about it. But that is what I understood. Painless will interact with the indexed version of the document, not the ingested one. Anyhow, glad it helped. Feel free to validate the answer. — Paulo, Dec 09 '21 at 07:09
Seems to be the case -> https://stackoverflow.com/a/42170005/5698202 — Abhirath Mahipal, Dec 09 '21 at 07:55

ElasticSearch Accessing Nested Documents in Script - Null Pointer Exception

1 Answers1

TLDr;

Details