0

I have a painless in elasticsearch, like:

POST _scripts/painless/calculate-price
{
  "script": "Map currencyMap = ['USD': 6.8, 'RUB': 0.122]; return doc['price'] * currencyMap[doc['currency']];"
}

I use this script to sort data, and size of currencyMap has huge impact on time cost.

So is there a way in painless to implement something like singleton, so I can initialize currencyMap only once and use many times?

Any help would be appreciated.

PS:

Following is my test script, document count of test is 5738306.

It took about 2000ms(avg) by calculate-score, and 600ms(avg).

POST _scripts/calculate-score
{
  "script": {
    "lang": "painless",
    "source": "Map currencyMap = ['A': 6.8, 'B': 0.122, 'BC': 0.122, 'C': 0.122, 'D': 0.122, 'E': 0.122, 'F': 0.122, 'G': 0.122, 'H': 0.122, 'I': 0.122, 'J': 0.122, 'K': 0.122, 'L': 0.122, 'M': 0.122, 'N': 0.122, 'O': 0.122, 'P': 0.122, 'Q': 0.122, 'R': 0.122, 'S': 0.122, 'T': 0.122, 'U': 0.122, 'V': 0.122, 'W': 0.122, 'X': 0.122, 'BY': 0.122, 'BZ': 0.122, 'AB': 0.122, 'BB': 0.122, 'CB': 0.122]; def price = doc['price'].getValue(); def currency = doc['currency']; if(doc['currency'] == null) {doc['currency'] = 'A';} def c =  currencyMap[currency]; if(c == null) {c = 0.11;}return price * c;"
  }
}

POST _scripts/calculate-score2
{
  "script": {
    "lang": "painless",
    "source": "Map currencyMap = ['A': 6.8]; def price = doc['price'].getValue(); def currency = doc['currency']; if(doc['currency'] == null) {doc['currency'] = 'A';} def c =  currencyMap[currency]; if(c == null) {c = 0.11;}return price * c;"
  }
}

GET /test/_search
{
  "_source": ["price", "currency"], 
  "sort" : {
        "_script" : {
            "type" : "number",
            "script" : {
                "id": "calculate-score"
            },
            "order" : "desc"
        }
    }
}

GET /test/_search
{
  "_source": ["price", "currency"], 
  "sort" : {
        "_script" : {
            "type" : "number",
            "script" : {
                "id": "calculate-score2"
            },
            "order" : "desc"
        }
    }
}
mr.dot
  • 16
  • 4
  • Please tell why do you think that the size of the Map has huge impact on time cost? According to [this doc](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-using.html#_script_parameters) the script will be compiled once and cached, thus there will be only one Map actually created. Scripted sorting is heavy itself (since ES cannot use indexes or fielddata for it). May you show how did you arrive to the conclusion that Map is the reason of low performance? How low is the perf by the way? – Nikolay Vasiliev Aug 24 '18 at 12:45
  • Hi Nikolay, maybe I known why. Though the script is compiled only once, but currencyMap is initialized on every document, that's why it took so much. So singleton is necessary. – mr.dot Sep 11 '18 at 15:32
  • So `calculate-score` takes 2000ms and `calculate-score2` takes 600ms? I doubt there is a way to make a singleton in a painless script, it runs in a kind of sandbox, I doubt it will let you create some reusable object. Also using `script` is costly, with Elasticsearch it is better to [denormalize data](https://www.elastic.co/guide/en/elasticsearch/guide/current/denormalization.html), like store it already converted to a generic currency, or store it in all currencies etc. – Nikolay Vasiliev Sep 11 '18 at 21:05
  • Currency changes frequently, and we have many documents (900W+), that's why I choose not to denormalize data and use script. – mr.dot Sep 17 '18 at 04:00

0 Answers0