2

I am trying to trim and lowercase all the values of the document that is getting indexed into Elasticsearch

The processors available has the field key is mandatory. This means one can use a processor on only one field

Is there a way to run a processor on all the fields of a document?

gd vigneshwar
  • 847
  • 1
  • 9
  • 19

1 Answers1

1

There sure is. Use a script processor but beware of reserved keys like _type, _id etc:

PUT _ingest/pipeline/my_string_trimmer
{
  "description": "Trims and lowercases all string values",
  "processors": [
    {
      "script": {
        "source": """
          def forbidden_keys = [
            '_type',
            '_id',
            '_version_type',
            '_index',
            '_version'
          ];
          
          def corrected_source = [:];
          
          for (pair in ctx.entrySet()) {
            def key = pair.getKey();
            if (forbidden_keys.contains(key)) {
              continue;
            }
            def value = pair.getValue();
            
            if (value instanceof String) {
              corrected_source[key] = value.trim().toLowerCase();
            } else {
              corrected_source[key] = value;
            }
          }
          
          // overwrite the original
          ctx.putAll(corrected_source);
        """
      }
    }
  ]
}

Test with a sample doc:

POST my-index/_doc?pipeline=my_string_trimmer
{
  "abc": " DEF ",
  "def": 123,
  "xyz": false
}
Vy Do
  • 46,709
  • 59
  • 215
  • 313
Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68