I have to update some fields in my ES documents.
I have an interger 'objectID' field, which is an unique id of the object concerned by the document.
I have a String 'objectType' field, which is the type of object concerned by the document.
All documents describe an action on the object and the objectType and objecID are always present in all documents.
Unfortunately, some documents with the objectType "post_image" have been indexed as "post". The objectID is still unique and valid and only a single type of documents have the wrong objectType. Therefore, all objects have at least another document with the right objectType and the same unique objectID.
I want to use an update_by_query to update the value of the objectType to "post_image" on all documents where the objectType is "post" and the objectID is in any other document where the objectType is "post_image".
Here's my pseudo-code script:
{
"query": {
"match" : { "objectType" : "post" } //all documents with objectType post
},
"script": {
"lang": "painless",
"source": "
//subquery selecting all objectIDs from documents with objectType "post_image"
subQueryResults = "query": {
"match" : { "objectType" : "post_image" }
//I don't know to filter results to retrive objectID field only
//no need for help here, i'll figure it out myself
}
if (/*ctx.source['objectID'] in subQueryResults*/){
ctx._source['objectType'] = "post_image"
}
"
}
I'm new to painless script and i have no idea how to put another query inside my script to get a list of all "post_image" ids. I know i can pass parameters to a script but i don't know if or how i can use a query result in that either.
Thanks!
EDIT:
I've solved part of my problem by extracting a csv list of concerned objectID with Kibana raw export and i've made a PHP script to parse each objectID and put it in my query string for my update_by_query which simply finds ALL document with matching objectID and replace the objectType field value to "post_image".
i'm using php curl to make these call and i have version conflict issues despite using "conflicts" : "proceed" in my request. I've tested the very same query in the dev console in kibana and it works perfectly and i couldn't find any explanation to why it dosen't update my documents when running from php.
Here's the script:
<?php
$query = "";
$csvFile = file($argv[1]);
try{
//$data = array();
$query = "";
$i = 0;
$csv_headers = array();
$uri = "http://ip/index/type/_update_by_query";
$conn = curl_init();
curl_setopt($conn, CURLOPT_URL, $uri);
curl_setopt($conn, CURLOPT_TIMEOUT, 5);
curl_setopt($conn, CURLOPT_SSL_VERIFYPEER, FALSE);
curl_setopt($conn, CURLOPT_SSL_VERIFYHOST, FALSE);
curl_setopt($conn, CURLOPT_RETURNTRANSFER, TRUE);
curl_setopt($conn, CURLOPT_FAILONERROR, FALSE);
curl_setopt($conn, CURLOPT_CUSTOMREQUEST, strtoupper('POST'));
curl_setopt($conn, CURLOPT_FORBID_REUSE, 0);
foreach ($csvFile as $line) {
try{
//WARNING: separator parameter of str_getcsv call is a risk or error based on the type of CSV used.
//skip header in CSV
if ($i > 0){
$data = str_getcsv($line,',');
//$data = explode(",", $line);
$id = $data[0];
echo $id.", ";
//old query, wasn't working
// $query = "{
// \"conflicts\": \"proceed\",
// \"query\": {
// \"match\" : { \"objectID\" : ".$id."
// }
// },
// \"script\": {
// \"lang\": \"painless\",
// \"source\": \"ctx._source['objectType'] = '".$argv[2]."'\"
// }
// }";
$query = "{
\"conflicts\": \"proceed\",
\"query\": {
\"bool\": {
\"must\": {
\"match\": {
\"objectType\": \"Post\"
}
},
\"filter\": {
\"terms\": {
\"objectID\": [
".$id."
]
}
}
}
},
\"script\": {
\"lang\": \"painless\",
\"source\": \"ctx._source['objectType'] = 'Post_image'\"
}
}";
curl_setopt($conn, CURLOPT_HTTPHEADER, array(
'Content-Type: application/json',
'Content-Length: ' . strlen($query))
);
curl_setopt($conn, CURLOPT_POSTFIELDS, json_encode($query));
$response = curl_exec($conn);
//sleep(1);
echo $response;
}
$i++;
}catch(Exception $e){
echo $e->getMessage();
//continue;
}
}catch(Exception $e){
echo $e->getMessage();
}
}
echo $query;
echo "\nCompleted.\n\n";
?>
example response:
{"index":"index",
"type":"type",
"id":"AWB0YFcjAFB9uQAwMSKx",
"cause":{"type":"version_conflict_engine_exception",
"reason":"[type][AWB0YFcjAFB9uQAwMSKx]: version conflict,
current version [27] is different than the one provided [26]",
"index_uuid":"yOD9SBy0RMmDZGK_N5o8qw",
"shard":"2",
"index":"index"},
"status":409}
It is pretty weird since i'm not giving any document version in my request. Parhaps it has something to do with some automatic internal behaviour from the upbade_by_query API.