I have a dataframe which I need to fix some parcel dates based on previous date values. Here's an example:
+-------------------+---------------------+-----------------------+----------+
|account |contract |contract_parcel | date |
+-------------------+---------------------+-----------------------+----------+
| 92397| 1| 1|2020-12-07|
| 92397| 1| 2| null|
| 92397| 2| 1|2020-12-07|
| 92397| 2| 2| null|
| 92397| 2| 3| null|
| 92397| 2| 4| null|
| 92397| 2| 5| null|
| 92397| 2| 6| null|
| 92397| 3| 1|2021-01-04|
| 92397| 3| 2|2021-02-01|
+-------------------+---------------------+-----------------------+----------+
For each account there are multiple contracts with multiple parcels. For those where the date column is null
I need to replicate the previous parcel value but adding a month to it and so on for all parcels.
I tried using Window
with lag
and last
functions but I cannot manage to update the date based on the previous value. I only managed to copy it.
I need an output like this one below:
+-------------------+---------------------+-----------------------+----------+
|account |contract |contract_parcel | date |
+-------------------+---------------------+-----------------------+----------+
| 92397| 1| 1|2020-12-07|
| 92397| 1| 2|2021-01-07|
| 92397| 2| 1|2020-12-07|
| 92397| 2| 2|2021-01-07|
| 92397| 2| 3|2021-02-07|
| 92397| 2| 4|2021-03-07|
| 92397| 2| 5|2021-04-07|
| 92397| 2| 6|2021-05-07|
| 92397| 3| 1|2021-01-04|
| 92397| 3| 2|2021-02-01|
+-------------------+---------------------+-----------------------+----------+
I also tried by iterating through the dataframe but the performance was very poor.