I'm working on something related to Amazon elasticsearch service.For that,I need to get data from Amazon Redshift.The data to be tranfered is huge i.e. 100 GB.Is there any way to get it directly form Redshift or is it a two step process like Redshift->s3->elasticsearch?
Asked
Active
Viewed 5,508 times
3
3 Answers
3
I see, at least in theory, 2 possible approaches for transfering data from Redshift to Elasticsearch:
- Logstash, using the JDBC input plugin
- elasticsearch-jdbc

Javier Alba
- 381
- 1
- 3
- 11
-
Are there any practical limitations for the above approaches? – AV94 Nov 17 '15 at 18:05
1
- Don’t gzip the data unloaded.
- Use the bulk load on elastic
- Use a large number of records in the bulk load (>5000) – fewer large bulk loads are better than more smaller ones.
- When working with AWS elastic search there is a risk of hitting the limits of the bulk queue size.
- Process a single file in the lambda and then recursively call the lambda function with an event
- Before recursing wait for a couple of seconds –> setTimeout. When waiting make sure that you aren’t idle for 30 seconds because your lambda will stop.
- Don’t use s3 object creation to trigger your lambda — you’ll end up with multiple lambda functions being called at the same time.
- Don’t bother trying to put kinesis in the middle – unloading your data into kinesis is almost certain to hit load limits in kinesis.
- Monitor your elastic search bulk queue size with something like this:
curl https://%ES-SERVER:PORT%/_nodes/stats/thread_pool |jq ‘.nodes |to_entries[].value.thread_pool.bulk’

olekb
- 638
- 1
- 9
- 28
-
2This answer is copied from this blog post: http://www.rojotek.com/blog/2016/11/22/9-things-i-learnt-while-moving-data-from-redshift-into-aws-elastic-search-with-aws-lambda/ – Reign of Error Feb 08 '21 at 19:05
0
It looks like there is no direct data transfer pipeline for pushing data into elasticsearch from Redshift. One alternative approach is to first dump the data in S3 and then push into elasticsearch.

AV94
- 1,824
- 3
- 23
- 36