0

I am using Kinesis data stream as a source and elasticsearch as a sink.

I am using Flink job to process this data a little bit then sink this data to elasticsearch.

In the production environment, the Kinesis data stream can generate 50,000 events per second. it's taking a lot of time to process data to process 500,000 events it takes nearly around 50 minutes of time.

Elasticsearch version 7.7 running on SSD-based storage.

Elasticsearch nodes: 2

Shards: 5

Replicas: 1 per shard

Refresh interval: 1 sec (default)

We are using AWS opensearch elasticsearch.

Can someone please suggest what causes this delay?

Rohit
  • 97
  • 10
  • 1
    You will need to update your question with more information about your elasticsearch cluster and how you are transfering data. What version are you using? What is the specs of the cluster? Are you using SSD disks or SSD based storage? – leandrojmp Nov 30 '21 at 13:09
  • The specs of your cluster and the configuration of your index can impact the indexing performance. You didn't share how many nodes you have, what is the config of each node in terms of CPU and Memory, how many shards and replicas your indice have, what is the refresh interval of the indice etc, so there is not much to help. I would suggest that you read this part of the [documentation](https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.html) about tuning a cluster for indexing speed. – leandrojmp Nov 30 '21 at 13:49

0 Answers0