We are using the kinesis flink connector
to consume and produce records into kinesis from flink. Since it is using KCL
, it should make entries in dynamoDB
with the offset for the kinesis streams it consumes. We are not able to see any tables with the application name in dynamoDB
. Is this the expected behavior?
Flink connector version: 1.8
Flink version: 1.8.0
Asked
Active
Viewed 132 times
0

justlikethat
- 329
- 2
- 12
-
Are you seeing data in the respective Kinesis outputs? – Arvid Heise Jan 28 '20 at 11:01
-
@ArvidHeise, I'm able to see the input and output in the respective streams. I can see the records being processed in the flink ui also. – justlikethat Jan 28 '20 at 11:33
-
Sorry for the slow response, I misunderstood your question and was running in the wrong direction. Your question is actually a [duplicate](https://stackoverflow.com/q/54825364/10299342). – Arvid Heise Jan 29 '20 at 19:22
-
@ArvidHeise, thanks for the response. But is this the expected behavior? Does flink not store the sequence number? – justlikethat Jan 30 '20 at 04:36
-
I'm going to add a response to the original question to clarify things. – Arvid Heise Jan 30 '20 at 08:21
-
1Does this answer your question? [Flink Kinesis Consumer not storing last successfully processed sequence nos](https://stackoverflow.com/questions/54825364/flink-kinesis-consumer-not-storing-last-successfully-processed-sequence-nos) – Arvid Heise Jan 30 '20 at 08:30
-
@ArvidHeise, thanks for the response. The answer makes a lot of sense. There is one situation where I see the need for storing sequence IDs. Dynam DB comes with the capability of Point in time recovery. Let's say I want to start processing (in this case, reprocessing) records from some previous point in time, I would be able to do it. With a save point, ill be able to process from the last record alone. – justlikethat Jan 31 '20 at 06:49
-
As I explained in the other thread, you can jump back to any savepoint in the past. That may not be as accurate as dynamo db but works with any kind of input. Btw, eEven if we would support sequence numbers, you would not get the desired result by just reverting back. Any stateful operator would contain the wrong state. Think of the counter example of the other thread: if I go back in time, the counter is still bound to the current checkpoint. So it would count everything double. – Arvid Heise Jan 31 '20 at 10:59
-
@ArvidHeise, you are absolutely right. It would make sense to use savepoints but let us consider a case where the reading/writing of checkpoints has some error due to an inflight change in the File system permissions and because of this, the streaming is held up. This would mean checkpoints and save points would not be generated and one thing that could possibly save us in a production environment is restarting the job by using the sequence id from the time where we suspect the error started occurring. Also, for regular savepoint, do we have an option other than an external option like cron. – justlikethat Jan 31 '20 at 11:20
-
I'm not sure that I can follow that example (would only work on stateless jobs, so no joins, windows, aggregations), but you are free to submit a feature request on [jira](https://issues.apache.org/jira/projects/FLINK/issues/) or on [mailing list](https://flink.apache.org/community.html#mailing-lists), where it's probably easier to outline and discuss than in the comments of SO. If it's an emergency like situation, I'd assume that just [specifying start position](https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/kinesis.html#configuring-starting-position) should be enough. – Arvid Heise Jan 31 '20 at 12:55
-
@ArvidHeise, sure will do. Thanks a lot! – justlikethat Feb 03 '20 at 13:42