0

I am following this guide to load realtime web traffic data into S3, configure Lambda to load into ES domain index. Currently for each record, I create a new json file in S3 bucket named like this {GUID}.json that contains only one row. for example:

{"email":"example@test.com","firstname":"Hello","lastname":"World"}

So when this goes live, it will upload millions of json files in S3 bucket then get pushed into ES via Lambda function. Is this the correct approach to load streaming data? or should I develop a scheduled process to aggregate multiple records every hour, for example 10k records per json file then upload to S3 bucket? I feel that is not technically "realtime streaming".

Any suggestions?

Bo Hu
  • 327
  • 1
  • 3
  • 13

1 Answers1

1

Did you tried using AWS Kinesis Firehose to load streaming data to ElasticSearch?

Reference: https://aws.amazon.com/kinesis/firehose/firehose-to-elasticsearch-service/

It would take away most of the need for effort on your end.

jwpfox
  • 5,124
  • 11
  • 45
  • 42
  • Greetings. Please check out [Take a tour](http://stackoverflow.com/tour) and [Your answer is in another castle: When is an answer not an answer](http://meta.stackexchange.com/questions/225370) to understand why link-only answers are not answers. Your answer could be improved with minor edits. Thx. – Drew Nov 14 '16 at 16:16