OK, I'll start with an elaborated use-case and will explain my question:
- I use a 3rd party web analytics platform which utilizes AWS Kinesis streams in order to pass data from the client into the final destination - a Kinesis stream;
- The web analytics platform uses 2 streams:
- A data collector stream (single shard stream);
- A second stream to enrich the raw data from the collector stream (single shard stream); Most importantly, this stream consumes the raw data from the first stream using
TRIM_HORIZON
iterator type;
- I consume the data from the stream using AWS Java SDK, secifically using the
GetShardIteratorRequest
class; - I'm currently developing the extraction class, so this is done synchronously, meaning I consume data only when I compile my class;
- The class surprisingly works, although there are some things that I fail to understand, specifically with respect to how the data is consumed from the stream and the meaning of each one of iterator types;
My problem is that the data I retrieve is inconsistent and has no chronological logic in it.
When I use
AT_SEQUENCE_NUMBER
and provide the first sequence number from the shard with.getSequenceNumberRange().getStartingSequenceNumber();
... as the ``, I'm not getting all records. Similarly,
AFTER_SEQUENCE_NUMBER
;- When I use
LATEST
, I'm getting zero results; - When I use
TRIM_HORIZON
, which should make sense to use, it doesn't seem to be working fine. It used to provide me the data, and then I've added new "events" (records to the final stream) and I received zero records. Mystery.
My questions are:
- How can I safely consume data from the stream, without having to worry about missed records?
- Is there an alternative to the
ShardIteratorRequest
? - If there is, how can I just "browse" the stream and see what's inside it for debugging references?
- What am I missing with the
TRIM_HORIZON
method?
Thanks in advance, I'd really love to learn a bit more about data consumption from a Kinesis stream.