The reason this is happening is because the set
processor will only operate within the context of the document you're sending, not the one stored (if any). Hence, override
has no effect here since the document you send does neither contain indexed_at
nor updated_at
, which is the reason why both fields are set on each call.
When you PUT
your document a second time, you're not updating it, you're actually re-indexing it from scratch (i.e. you're overriding the first version you sent). Ingest pipelines do not work with update operations. For instance, if you try the following update call, it will fail.
POST test_pipelines/doc/1/_update?pipeline=timestamps
{
"doc": {
"foo": "bor"
}
}
If you want to stick with your ingest pipeline, the only way to make it work is to GET
the document first and then update the field(s) you want. For instance,
# 1. index the document the first time
PUT test_pipelines/doc/1?pipeline=timestamps
{
"foo": "bar"
}
# 2. GET the indexed document
GET test_pipelines/doc/1
# 3. update the foo field and index it again
PUT test_pipelines/doc/1?pipeline=timestamps
{
"indexed_at": "2018-07-20T05:08:52.293Z",
"updated_at": "2018-07-20T05:08:52.293Z",
"foo": "bor"
}
# 4. When you GET the document the second time, you'll see your pipeline worked
GET test_pipelines/doc/1
This will return:
{
"indexed_at": "2018-07-20T05:08:52.293Z",
"updated_at": "2018-07-20T05:08:53.345Z",
"foo": "bor"
}
I definitely agree this is really troublesome, but the link I gave above enumerates all the reasons why pipelines are not supported on update operations.
Another way to make it work the way you like (without pipelines) would be to use a scripted upsert operation (which works like steps 2 and 3 above, i.e. GETs and PUTs the document in a single atomic operation), and that would also work with your bulk calls. It basically goes like this. First you need to store a script that you will call for both your indexing and update operations:
POST _scripts/update-doc
{
"script": {
"lang": "painless",
"source": "ctx._source.foo = params.foo; ctx._source.updated_at = new Date(); if (ctx._source.indexed_at == null) ctx._source.indexed_at = ctx._source.updated_at;"
}
}
Then, you can index your document the first time like this:
POST test_pipelines/doc/1/_update
{
"script": {
"id": "update-doc",
"params": {
"foo": "bar"
}
},
"scripted_upsert": true,
"upsert": {}
}
The indexed document will look like this:
{
"updated_at": "2018-07-20T05:57:40.510Z",
"indexed_at": "2018-07-20T05:57:40.510Z",
"foo": "bar"
}
And you can use the exact same call when updating the document:
POST test_pipelines/doc/1/_update
{
"script": {
"id": "update-doc",
"params": {
"foo": "bor" <--- only this changes
}
},
"scripted_upsert": true,
"upsert": {}
}
The updated document will look like this, exactly what you wanted:
{
"updated_at": "2018-07-20T05:58:42.825Z",
"indexed_at": "2018-07-20T05:57:40.510Z",
"foo": "bor"
}