I have a Flink batch job which reads from kafka and writes to S3. The current strategy of this job is to read
From: timestamp To: timestamp.
So I basically have my Kafka consumer as follows:
KafkaSource.<T>builder()
.setBootstrapServers(resolvedBootstrapBroker)
.setTopics(List.of("TOPIC_0"))
.setGroupId(consumerGroupId)
.setStartingOffsets(OffsetsInitializer.timestamp(startTimeStamp))
.setValueOnlyDeserializer(deserializationSchema)
.setBounded(OffsetsInitializer.timestamp(endTimeStamp))
.setProperties(additionalProperties)
.build();
Start timestamp and end timestamp are calculated as follows(from 10 days ago to 10 hours ago):
long startTimeStamp = Instant.now().minus(10, ChronoUnit.DAYS).toEpochMilli();
long endTimeStamp = Instant.now().minus(10, ChronoUnit.HOURS).toEpochMilli();
However, the records are not written to S3. If I just switch the bounded
parameter as:
.setBounded(OffsetsInitializer.latest())
it works and writes to S3. Any idea what I might be doing wrong?
EDIT:
I learnt that it is writing the partial file. But it is not converting the partial file to full file. Any idea why that might be happening?