I have a parquet file made with Pyspark, that contains a time column that contains values from 0 to x seconds. I want to be able to stream each row to a Kafka topic when the time comes.
For example, when the stream starts I want to stream all rows with time 0, the next row is sent when the time from the start of the stream equals the time cell in said row. So on, until we stream the last row.
I can think of a way to do that in Pyspark, but it's inefficient in so many ways that it can lead to multiple rows being sent at once even though their times are not equal.
Is there any other framework or library that can achieve this task well?