KCL does not provide this sort of built-in redrive mechanism - once processRecords returns (whether it threw an exception or returned successfully), it considers those records as processed and moves on, even if internally it failed.
If you want to reprocess some records at a later point, you need to capture those records and store them somewhere else for reprocessing attempt later (with the obvious caveat that they won't be processed in order from the rest of the stream).
The simplest solution for this is to have your record processor logic identify the failed records (before returning to KCL) and send them to an SQS queue. Then, the records aren't lost, and they're available for processing at your leisure (or by another process consuming the SQS queue, possibly with a DLQ mechanism for handling repeated failures / give-up scenarios).
To answer your specific questions:
- Nope. Checkpointing just says "I've got this far, don't look at things before the checkpoint"
- Think of checkpointing like a global state. Once it's set, it encompasses everything that came before it. You also don't need to checkpoint every call to processRecords - you might do it every X seconds, or every Y records, etc.
- Not at KCL level - you could use a special exception type internally, and catch that at your outer level of processRecords just before you return to Kinesis. Or you could just catch all exceptions - it's up to you and how specific you want to be with your redrive logic.