I need to process, at peak, 100s of records per second. Those records are simple JSON bodies and they should be collected and then processed/transformed into a database.
A few questions ...
1) Is Kinesis right for this? Or is SQS better suited?
2) When using kinesis, do I want to use the python examples as shown here: https://aws.amazon.com/blogs/big-data/snakes-in-the-stream-feeding-and-eating-amazon-kinesis-streams-with-python/ or should I be implementing my producer and consumer in KCL? What's the difference?
3) Does Kinesis offer anything to the management of the consumers, or do I just run them on EC2 instances and manage them myself?
4) What is the correct pattern for accessing data - I can't afford to miss any records, so I assume I would be fetching records from "TRIM_HORIZON" and not "LATEST". If so, how do I manage duplicates? In other words, how do my consumers get records from the stream and handle consumers going down, etc and always know they are fetching all the records?
Thanks!