1

ElasticSearch Version: 5.6

I have imported MySQL data in ElasticSearch and I have added mapping to the elastic search as required. Following is one mapping for the column application_status.

Mappings:

{
"settings": {
    "analysis": {
        "analyzer": {
            "case_insensitive": {
                "type": "custom",
                "tokenizer": "keyword",
                "filter": ["lowercase"]
            }
        }
    }
},
"mappings": {
    "lead": {
        "properties": {
            "application_status": {
                "type": "string",
                "analyzer": "case_insensitive",
                "fields": {
                    "keyword": {
                        "type": "keyword"
                    }
                }
            }
        }
    }
}}

On the above mapping, I am able to do simple sorting (asc or desc) using following query:

{
"size": 50,
"from": 0,
"sort": [{
    "application_status.keyword": {
        "order": "asc"
    }
}]}

which is MySql equivalent of

select * from <table_name> order by application_status asc limit 50;

Need help on following problem: I have MySQL query which sorts based on application_status:

select * from vLoan_application_grid order by CASE WHEN application_status = "IP_QUAL_REASSI" THEN application_status END desc, CASE WHEN application_status = "IP_COMPLE" THEN application_status END desc, CASE WHEN application_status LIKE "IP_FRESH%" THEN application_status END desc, CASE WHEN application_status LIKE "IP_%" THEN application_status END desc

Please help me write the same query in ElasticSearch. I am not able to find order by value equivalent for strings in ElasticSearch. Searching online, I understood that, I should use sorting scripts but not able to find any proper documentation.

I have following query which just does simple sort.

{
"size": 500,
"from": 0,
"query" : {
    "match_all": {}
},
"sort": {
    "_script": {
        "type": "string",
        "script": {
            "source": "doc['application_status.keyword'].value",
            "params": {
                "factor": ["IP_QUAL_REASS", "IP_COMPLE"]
            }
        },
        "order": "desc"
    }
}}

In the above query, I am not using params section as I am not aware how to use it for type: string

I believe I am asking too much. Please help or any relevant documentation links would be greatly appreciated. Hope question is clear. I'll provide more details if necessary.

Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
Sagar Chilukuri
  • 1,430
  • 2
  • 17
  • 29
  • I am no expert in MySQL, but maybe I can help with Elasticsearch. Can you explain what that sorting statement you have in MySQL does? – Andrei Stefan Nov 02 '17 at 06:20
  • `application_status` column can have multiple values. I have to sort by application by its status value. Suppose 10 applications have `application_status = 'IP_QUAL_REASS'`, and 5 have `application_status = 'IP_COMPLE'`, then applications which are having status `IP_QUAL_REASS` should come at top while second condition's applications should come below that. – Sagar Chilukuri Nov 02 '17 at 06:42
  • In simpler terms, please help me to sort the column based on column values. – Sagar Chilukuri Nov 02 '17 at 06:52
  • `"sort": [{ "application_status.keyword": { "order": "asc" } }]}` is doing exactly that. – Andrei Stefan Nov 02 '17 at 06:57
  • No Sir. Its sorting whole column irrespective of values. In MySQL, their is difference between `sort` and `sort by value`. `sort by value` value orders rows on their field values first. Please check https://www.electrictoolbox.com/mysql-order-specific-field-values/. – Sagar Chilukuri Nov 02 '17 at 07:05

1 Answers1

2

You have two options:

  • the most performant one is to index at indexing time another field that should be a number. This number (your choice) will be the numerical representation of that status. Then at search time, you simply sort by that number and not by the status
  • at search time use a script that will do almost the same thing as the first option, but dynamically, and less performant (but still quite fast)

Below you have the second option:

  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else return 1000;",
        "params": {
          "factor": [{
            "IP_QUAL_REASS":1,
            "IP_COMPLE":2,
            "whatever":3
          }
          ]
        }
      },
      "order": "asc"
    }
  }

If you also want things like LIKE WHATEVER%, my suggestion is to consider an indexing time change, rather than search time because the script gets more complex. But, this is the one for wildcard matches as well:

  "sort": {
    "_script": {
      "type": "number",
      "script": {
        "source": "if (params.factor[0].containsKey(doc['application_status.keyword'].value)) return params.factor[0].get(doc['application_status.keyword'].value); else { params.wildcard_factors[0].entrySet().stream().filter(kv -> doc['application_status.keyword'].value.startsWith(kv.getKey())).map(Map.Entry::getValue).findFirst().orElse(1000)}",
        "params": {
          "factor": [
            {
              "IP_QUAL_REASS": 1,
              "IP_COMPLE": 2,
              "whatever": 3
            }
          ],
          "wildcard_factors": [
            {
              "REJ_": 66
            }
          ]
        }
      },
      "order": "asc"
    }
  }
Andrei Stefan
  • 51,654
  • 6
  • 98
  • 89
  • Wow. It handles exact string match perfectly. How do I handle wildcard strings as mentioned in question case 3. For example: `application_status LIKE "REJ_%" THEN application_status END desc`. It describes that, if any `application_status` starts with `REJ%`, then order the value. – Sagar Chilukuri Nov 02 '17 at 07:55
  • Are there that many values for that status that you couldn't just list them in the script parameter? – Andrei Stefan Nov 02 '17 at 08:11
  • There are 74 possible statuses. I can list them in script params but then that would mean hardcoding the statuses. In future, if any new status is added, I have to update elastic search too. Just want to implement in dynamic way; if possible – Sagar Chilukuri Nov 02 '17 at 08:14
  • I added a script for wildcard matches as well. – Andrei Stefan Nov 02 '17 at 08:50