I have a performance issue that I am trying to solve... I am doing a reindex on-the-fly from a source index in AWS managed Elasticsearch 6.2 to a destination index. The source index is currently hundreds of GB in size and likely to be larger in production. As such, the reindex will take some time to complete. I am trying to minimize that as much as possible, as per business requirements. I read that some of the things I can do to speed up a reindex are as follows:
1) Use a judicious number of slices compared with number of shards on the cluster for parallelism (e.g. 10 shards should ideally have no more than 10 slices running, rest is waste and potential overhead)
2) Do not have replica shards on the destination index if you don't need them, this adds work to write data to the cluster
3) Use the correct EC2 instance types in the cluster to accomplish this task
4) Only copy what information you need from the source index to reindex.
Point #4 above is where I need guidance... I am using the Jest API (v.5.3.3) in Java 8. Is there a way to perform a _rendex query but only returning back one or two fields in the _source, so that the actual data I am writing to the destination index is only a fraction of the size of the source?