I am following this guide to load realtime web traffic data into S3, configure Lambda to load into ES domain index. Currently for each record, I create a new json file in S3 bucket named like this {GUID}.json that contains only one row. for example:
{"email":"example@test.com","firstname":"Hello","lastname":"World"}
So when this goes live, it will upload millions of json files in S3 bucket then get pushed into ES via Lambda function. Is this the correct approach to load streaming data? or should I develop a scheduled process to aggregate multiple records every hour, for example 10k records per json file then upload to S3 bucket? I feel that is not technically "realtime streaming".
Any suggestions?