looking to fill the pyspark dataframe and load the missing values. Existing Pyspark DataFrame -
ID | Date | Qty |
---|---|---|
100 | 2023-02-01 | 5 |
100 | 2023-02-03 | 3 |
100 | 2023-02-04 | 3 |
100 | 2023-02-05 | 3 |
100 | 2023-02-08 | 3 |
100 | 2023-02-09 | 11 |
100 | 2023-02-10 | 11 |
100 | 2023-02-11 | 10 |
100 | 2023-02-13 | 0 |
Expected Pyspark DataFrame (filling bold values) -
ID | Date | Qty |
---|---|---|
100 | 2023-02-01 | 5 |
100 | 2023-02-02 | 3 *add row date and lowest adjacent qty |
100 | 2023-02-03 | 3 |
100 | 2023-02-04 | 3 |
100 | 2023-02-05 | 3 |
100 | 2023-02-06 | 3 *add row date and lowest adjacent qty |
100 | 2023-02-07 | 3 *add row date and lowest adjacent qty |
100 | 2023-02-08 | 3 |
100 | 2023-02-09 | 11 |
100 | 2023-02-10 | 11 |
100 | 2023-02-11 | 10 |
100 | 2023-02-12 | 0 *add row date and lowest adjacent qty |
100 | 2023-02-13 | 0 |
did refer to existing answer but it doesnt fulfill my requirement of filling the lowest adjacent value(PySpark generate missing dates and fill data with previous value)