0

I have a test ElasticSearch 6.0 index populated with millions of records, likely to be in the billions in production. I need to search for a subset of these records, then save this subset of the original set into a secondary index for later searching. I have proven this out via querying ES on Kibana, the challenge is to find appropriate APIs in Java 8 using my Jest client (searchbox.io, version 5.3.3) to do the same. The ElasticSearch cluster is on AWS, so using a transport client is out.

POST _reindex?slices=10&wait_for_completion=false
{ "conflicts": "proceed",
  "source":{
    "index": "my_source_idx",
    "size": 5000,
    "query": { "bool": {
      "filter": { "bool" : { "must" : [
        { "nested": { "path": "test", "query": { "bool": { "must":[
           { "terms" : { "test.RowKey": ["abc"]} },
           { "range" : { "test.dates" : { "lte": "2018-01-01", "gte": "2010-08-01"} } },
           { "range" : { "test.DatesCount" : { "gte": 2} } },
           { "script" : { "script" : { "id": "my_painless_script", 
              "params" : {"min_occurs" : 1, "dateField": "test.dates", "RowKey": ["abc"], "fromDate": "2010-08-01", "toDate": "2018-01-01"}}}}
        ]}}}}
      ]}}
    }}
  },
  "dest": {
    "index": "my_dest_idx"
  },
  "script": {
    "source": <My painless script>
  } }

I am aware I can perform a search on the source index, then create and bulk load the response records onto the new index, but I want to be able to do this all in one shot, as I do have a painless script to glean off some information that is pertinent to the queries that will search the secondary index. Performance is a concern, as the application will be chaining subsequent queries together using the destination index to query against. Does anyone know how to do accomplish this using Jest?

Nkosi
  • 235,767
  • 35
  • 427
  • 472
BPS
  • 607
  • 8
  • 29

1 Answers1

0

It appears as if this particular functionality is not yet supported in Jest. The Jest API It has a way to pass in a script (not a query) as a parameter, but I even was having problems with that.

EDIT:

After some hacking with a coworker, we found a way around this...

Step 1) Extend the GenericResultAbstractionAction class with edits to the script:

public class GenericResultReindexActionHack extends GenericResultAbstractAction {
    GenericResultReindexActionHack(GenericResultReindexActionHack.Builder builder) {
        super(builder);

        Map<String, Object> payload = new HashMap<>();
        payload.put("source", builder.source);
    payload.put("dest", builder.dest);
    if (builder.conflicts != null) {
        payload.put("conflicts", builder.conflicts);
    }
    if (builder.size != null) {
        payload.put("size", builder.size);
    }
    if (builder.script != null) {
        Script script = (Script) builder.script;

// Note the script parameter needs to be formatted differently to conform to the ES _reindex API:

        payload.put("script", new Gson().toJson(ImmutableMap.of("id", script.getIdOrCode(), "params", script.getParams())));
    }
    this.payload = ImmutableMap.copyOf(payload);

    setURI(buildURI());
}

@Override
protected String buildURI() {
    return super.buildURI() + "/_reindex";
}

@Override
public String getRestMethodName() {
    return "POST";
}

@Override
public String getData(Gson gson) {
    if (payload == null) {
        return null;
    } else if (payload instanceof String) {
        return (String) payload;
    } else {

// We need to remove the incorrect formatting for the query, dest, and script fields:

        // TODO: Need to consider spaces in the JSON
        return gson.toJson(payload).replaceAll("\\\\n", "")
                        .replace("\\", "")
                        .replace("query\":\"", "query\":")
                        .replace("\"},\"dest\"", "},\"dest\"")
                        .replaceAll("\"script\":\"","\"script\":")
                .replaceAll("\"}","}")
                .replaceAll("},\"script\"","\"},\"script\"");

    }
}

public static class Builder extends GenericResultAbstractAction.Builder<GenericResultReindexActionHack , GenericResultReindexActionHack.Builder> {

    private Object source;
    private Object dest;
    private String conflicts;
    private Long size;
    private Object script;

    public Builder(Object source, Object dest) {
        this.source = source;
        this.dest = dest;
    }

    public GenericResultReindexActionHack.Builder conflicts(String conflicts) {
        this.conflicts = conflicts;
        return this;
    }

    public GenericResultReindexActionHack.Builder size(Long size) {
        this.size = size;
        return this;
    }

    public GenericResultReindexActionHack.Builder script(Object script) {
        this.script = script;
        return this;
    }

    public GenericResultReindexActionHack.Builder waitForCompletion(boolean waitForCompletion) {
        return setParameter("wait_for_completion", waitForCompletion);
    }

    public GenericResultReindexActionHack.Builder waitForActiveShards(int waitForActiveShards) {
        return setParameter("wait_for_active_shards", waitForActiveShards);
    }

    public GenericResultReindexActionHack.Builder timeout(long timeout) {
        return setParameter("timeout", timeout);
    }

    public GenericResultReindexActionHack.Builder requestsPerSecond(double requestsPerSecond) {
        return setParameter("requests_per_second", requestsPerSecond);
    }

    public GenericResultReindexActionHack build() {
        return new GenericResultReindexActionHack(this);
    }
}

}

Step 2) Use of this class with a query then requires you to pass in the query as part of the source, then remove the '\n' characters:

ImmutableMap<String, Object> sourceMap = ImmutableMap.of("index", sourceIndex, "query", qb.toString().replaceAll("\\\\n", ""));
        ImmutableMap<String, Object> destMap = ImmutableMap.of("index", destIndex);

GenericResultReindexActionHack reindex = new GenericResultReindexActionHack.Builder(sourceMap, destMap)
                .waitForCompletion(false)
                .conflicts("proceed")
                .size(5000L)
                .script(reindexScript)
                .setParameter("slices", 10)
                .build();

        JestResult result = handleResult(reindex);
        String task = result.getJsonString();
        return (task);

Note the reindexScript parameter is of type org.elasticsearch.script.

This is a messy, hack-y way of getting around the limitations of Jest, but it seems to work. I understand that by doing it this way there may be some limitations to what may be acceptable in the input formatting...

BPS
  • 607
  • 8
  • 29
  • I'm getting this following error with JestClient when I try to use reindex with wait_for_completion=false. I got the task ID. But when I fetch the task info, I'm getting the authentication failure as reason in the response. action [indices:data/write/reindex] requires authentication. I've set elastic basic authentication in JEST http client config. But for this reindex call only I'm getting this error. Not sure how debug JEST client request headers. Does anyone know how to solve this? – ARods Apr 18 '20 at 05:58