0

I am attempting to use nested values in a script score, but I am having issues making it work, because I am unable to iterate over the field by accessing it through doc. Also, when I try to query it in Kibana like _type:images AND _exists_:colors, it will not match any documents, even though the field is clearly present in all my docs when I view them individually. I am however able to access it using params._source, but I have read that it can be slow slow and is not really recommended.

I know that this issue is all due to the way we have created this nested field, so if I cannot come up with something better than this, I will have to reindex our 2m+ documents and see if I can find another way around the problem, but I would like to avoid that, and also just get a better understanding of how Elastic works behind the scenes, and why it acts the way it does here.

The example I will provide here is not my real life issue, but describes the issue just as well. Imagine we have a document, that describes an image. This document has a field that contains values for how much red, blue, and green exists in an image.

Requests to create index and documents with nested field that contains arrays of colors with a 100 point split between them:

PUT images
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "_doc": {
      "properties": {
        "id" : { "type" : "integer" },
        "title" : { "type" : "text" },
        "description" : { "type" : "text" },
        "colors": {
          "type": "nested",
          "properties": {
            "red": {
              "type": "double"
            },
            "green": {
              "type": "double"
            },
            "blue": {
              "type": "double"
            }
          }
        }
      }
    }
  }
}

PUT images/_doc/1
{
    "id" : 1,
    "title" : "Red Image",
    "description" : "Description of Red Image",
    "colors": [
      {
        "red": 100
      },
      {
        "green": 0
      },
      {
        "blue": 0
      }
    ]
}

PUT images/_doc/2
{
    "id" : 2,
    "title" : "Green Image",
    "description" : "Description of Green Image",
    "colors": [
      {
        "red": 0
      },
      {
        "green": 100
      },
      {
        "blue": 0
      }
    ]
}

PUT images/_doc/3
{
    "id" : 3,
    "title" : "Blue Image",
    "description" : "Description of Blue Image",
    "colors": [
      {
        "red": 0
      },
      {
        "green": 0
      },
      {
        "blue": 100
      }
    ]
}

Now, if I run this query, using doc:

GET images/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "script_score": {
            "script": {
              "source": """
                boolean debug = true;
                for(color in doc["colors"]) {
                  if (debug === true) {
                    throw new Exception(color["red"].toString());
                  }
                }
              """
            }
          }
        }
      ]
    }
  }
}

I will get exception No field found for [colors] in mapping with types [], but if I use params._source instead, like so:

GET images/_search
{
  "query": {
    "function_score": {
      "functions": [
        {
          "script_score": {
            "script": {
              "source": """
                boolean debug = true;
                for(color in params._source["colors"]) {
                  if (debug === true) {
                    throw new Exception(color["red"].toString());
                  }
                }
              """
            }
          }
        }
      ]
    }
  }
}

I am able to output "caused_by": {"type": "exception", "reason": "100"}, so I know that it worked since the first document is red and has a value of 100.

I am not even sure that this can classify as a question, but more a cry for help. If someone can explain why this is behaving the way it is, and give an idea of the best way to get around the issue, I would really appreciate it.

(Also, some tips for debugging in Painless would also be lovely!!!)

Severin
  • 962
  • 5
  • 21

2 Answers2

1

Don't worry about the slowness of params._source -- it's your only choice here because iterating the doc's nested context only allows a single nested color to be accessed.

Try this:

GET images/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "image"
          }
        },
        {
          "function_score": {
            "functions": [
              {
                "script_score": {
                  "script": {
                    "source": """
                        def score = 0;
                        for (color in params._source["colors"]) {
                          // Debug.explain(color);
                          if (color.containsKey('red')) {
                            score += color['red'] ;
                          }
                        }
                        return score;
                    """
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

The painless score context is here.

Secondly, you were pretty close w/ throwing an exception manually -- there's a cleaner way to do it though. Uncomment Debug.explain(color); and you're good to go.

One more thing, I purposefully added a match query to increase the scores but, more importantly, to illustrate how a query is built in the background -- when you rerun the above under GET images/_validate/query?explain, you'll see for yourself.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
1

In Elasticsearch's scoring script "script_score": {"script": {"source": "..." }} you may access nested values using param._source object.

For example, if you have documents index with documents like these:

{
  "title": "Yankees Potential Free Agent Target: Max Scherzer",
  "body": "...",
  "labels": {
    "genres": "news",
    "topics": ["sports", "celebrities"]
    "publisher": "CNN"
  }
}

the following query will return 100 documents in randomized order, giving preference to documents with sports topic:

GET documents/_search
{
  "size": 100,
  "sort": [
    "_score"
  ],
  "query": {
    "function_score": {
      "query": { "match_all": {} },
      "functions": [
        {
          "random_score": {}
        },
        {
          "script_score": {
            "script": {
              "source": """
                double boost = 1.0;
                if (params._source['labels'] != null && params._source['labels']['topics'] != null && params._source['labels']['topics'].contains('sports') {
                    boost += 2.0;
                }
                return boost;
              """
            }
          }
        }
      ],
      "score_mode": "multiply",
      "boost_mode": "replace"
    }
  }
}
denpost
  • 371
  • 2
  • 6