7

I am hosting Elasticsearch cluster in EKS and I'd like to stream all cloudwatch groups to this Elasticsearch cluster via Kinesis Firehose. But AWS Kinesis firehose doesn't support stream data to Elasticsearch cluster other than AWS hosted ES.

What is the best way to stream data to self hosted ES cluster?

Joey Yi Zhao
  • 37,514
  • 71
  • 268
  • 523

1 Answers1

2

I think the best way is by means of a lambda function for Firehose. For this to work, you would have to choose supported destination, e.g. S3. The function normally is used to transform the records, but you can program what ever logic you want, including uploading records to a custom ES.

If you would use Python, the function could use elasticsearch layer to connect with your custom cluster and inject records into it. elasticsearch is python interface to ES and it will work with any ES cluster.

An alternative is to use HTTP Endpoint for Your Destination. In this scenario, you could have maybe small instance on ec2 container which would get the records from firehose, and then push them to ES. Just like before, elasticsearch library could be used with Python.

Marcin
  • 215,873
  • 14
  • 235
  • 294
  • Problem I foresee with lambda is increasing costs as data quantity increases + maintenance for the lambda. The problem with using HTTP Endpoint in Kinesis FireHose to point directly to an ES is that kinesis uses PUT rather than a POST. ES needs a POST if you want to want to upload docs without a _id parameter Then there is auth to think about. How about using Nginx to convert the PUT into a POST and also acting as a reverse proxy for auth? – Moulick Oct 21 '21 at 13:38
  • @Moulick There no other options in my view. Since you have to program injection of records yourself, you do exactly as you would do without using aws. It's up to you how you process messages from firehose once you obtain them from http endpoint. – Marcin Oct 21 '21 at 21:20
  • @Moulick How is it going? Still unclear what are the options of streaming firehose to self hosted ES cluster? – Marcin Oct 23 '21 at 23:13
  • The above idea of using Nginx for converting the HTTP Verbs works, also auth was solved, aka giving the access key to kinesis which gets converted by Nginx to basic auth of ElasticSearch. Currently implementing a ingest pipeline in ES to split the incoming record into multiple documents. The thing with the data coming from KinesisFirehose is that the json document that it sends contains several records in one. So need a way to split that into separate documents. Also need to base64 decode the data . Which probably a painless script in the ingest pipeline will do the trick. – Moulick Oct 24 '21 at 07:03
  • Well Darn, ElasticSearch really cannot split the documents into multiple documents, uff https://github.com/elastic/elasticsearch/issues/56769 – Moulick Oct 24 '21 at 07:12
  • @Moulick You need to process the records. Lambda is best. I see you have much more specific issue than this question, which is generic . I would suggest making new question, with details of your setup and difficulties. – Marcin Oct 24 '21 at 10:33
  • @Moulick How did it go? Still unclear what are the options for stream data to self hosted ES cluster from Firehose? – Marcin Oct 27 '21 at 08:23
  • The options outlined in this answer are basically the way to go. As stated, the transformation Lambda is really not intended for shipping events to a destination, but it's technically possible of course. I personally prefer using the HTTP Endpoint destination. The receiving side could also be API Gateway with a Lambda function, which then handles the communication between Kinesis and Elasticsearch [according to the HTTP specification](https://docs.aws.amazon.com/firehose/latest/dev/httpdeliveryrequestresponse.html). – ba0708 Jul 14 '22 at 08:09