1

Could I ask, how could I reindex while converting a 'string' field e.g. "field2": "123.2" (in old index documents) into a float/double number e.g. "field2": 123.2 (intended to be in the new index) ? This post is the closest I could get, but I do not know which function to use for the cast/conversion of a string to a number. I am using ElasticSearch version 2.3.3. Thank you very much for any advice !!!

leeway
  • 11
  • 1
  • 2

2 Answers2

0

Use Elasticsearch templates to specify the mapping for the new index and specify the field as a double type.

The easiest way to build a template is to use the existing mapping.

GET oldindex/_mapping
POST _template/templatename
{
  "template" : "newindex", // this can be a wildcard pattern to match indexes
  "mappings": { // this is copied from the response of the previous call
    "mytype": {
      "properties": {
        "field2": {
          "type": "double" // change the type
        }
      }
    }
  }
}
POST newindex
GET newindex/_mapping

Then use the elasticsearch _reindex API to move the data from the old index to the new index and parse the field as a double using an inline scripting (you may need to enable inline scripting)

POST _reindex
{
  "source": {
    "index": "oldindex"
  },
  "dest": {
    "index": "newindex"
  },
  "script": {
    "inline": "ctx._source.field2 = ctx._source.field2.toDouble()"
  }
}

Edit: Updated to use _reindex endpoint

troymass
  • 1,022
  • 3
  • 11
  • 24
  • Thank you very much masstroy. My problem is exactly in the point you described as: " transform the json during the scan and scroll". I have found googling that it should be possible with _reindex functionality of elasticsearch creating a new scripted field in the _reindex request query, but the exact syntax or the functions are basically nowhere to be found (to my current experience). Just a note: I am *not* using logstash. – leeway Mar 07 '17 at 12:31
  • would something like this [1] work if I found a conversion function for the inline script which would transform thevalue of the field from string->float ? [1] http://stackoverflow.com/questions/42423899/renaming-fields-to-new-index-in-elasticsearch/42425103#42425103 – leeway Mar 07 '17 at 12:43
  • the post closest to the string-> float conversion which I mentioned in the original question is here (sorry to be unspecific before): http://stackoverflow.com/questions/30706361/convert-strings-to-floats-at-aggregation-time – leeway Mar 07 '17 at 12:57
  • My experience has been on ES 1.X so I'm not familiar enough with the _reindex feature of ES. If you do the scan and scroll manually via a script, you could apply the transform there. – troymass Mar 07 '17 at 16:42
  • Can you apply the transform with a painless or groovy script on the _reindex endpoint like: `doc['field2'].toDouble()` – troymass Mar 07 '17 at 16:51
  • I was able to get this working POST _reindex { "source": { "index": "oldindex" }, "dest": { "index": "newindex" }, "script": { "inline": "ctx._source.field2 = ctx._source.field2.toDouble()" } } – troymass Mar 07 '17 at 17:48
  • I updated the answer with the full steps. I tested this on ES 2.3. On ES 5.x you may need to specify the scripting language as groovy – troymass Mar 07 '17 at 17:57
  • Thank you so much ! Now I just have to ask the administrator of our cluster why did they disable inline scripting in ES ;/ But I am sure this will work and I find the exact syntax extremely helpful ! Thank you again ! I tried to create scripted field in Kibana, but it did not work, because scripted fields there require the input field to be numeric if the output is numeric. – leeway Mar 07 '17 at 19:26
  • Scripting is turned off by default as a security precaution https://www.elastic.co/blog/scripting-security . It looks like with Elastic 5.x they created their own language called painless which is sandboxed and is thus secure enough to run inline so is enabled by default. But on 2.x you'll have to use Groovy and enable it. You can also install the script on the server, but either way requires your cluster admin – troymass Mar 07 '17 at 19:51
0

You could use Logstash to reindex your data and convert the field. Something like the following:

input {
  elasticsearch {
    hosts => "es.server.url"
    index => "old_index"
    query => "*"
    size => 500
    scroll => "5m"
    docinfo => true
  }
}

filter {
  mutate {
    convert => { "fieldname" => "long" }
  }
}

output {
  elasticsearch {
    host => "es.server.url"
    index => "new_index"
    index_type => "%{[@metadata][_type]}"
    document_id => "%{[@metadata][_id]}"
  }
}
Christian Häckh
  • 512
  • 6
  • 11