NebulaGraph exchange tool kafka data to spark has an offset exception, resulting in data loss

Asked Aug 16 '23 at 03:20

Active Aug 16 '23 at 03:20

Viewed 5 times

nebula version: v3.5.0
Deployment method: distributed
Installation method: Docker
Whether to use the production environment: Y
hardware information
- Disk SATA
- 64C 128G

When submitting the spark-streaming program to consume topic data through spark-submit, assuming the spark program stops suddenly, the data with an offset of 100 has been consumed when it stops, and 100 pieces of data are sent to the topic at this time, and then spark is restarted -streaming consumes data, at which offset does Exchange start to consume Kafka at this time? According to our tracking, consumption starts from offset 200, which will cause data loss at offset 100-200.

How should I set it up if I want to continue spending from 100?

asked Aug 16 '23 at 03:20

Jian Wang

NebulaGraph exchange tool kafka data to spark has an offset exception, resulting in data loss

0 Answers0