I want to make a loop on row numbers of a partitions in dataframe to check conditions and create extra columns depending on the result of current row_number.
So, i have a dataframe representing contacts with master id <-> raw id association (a master can have multiple raws). I partionned this dataframe with window function and a row_number() over a partition of most recent created raws :
# Create a window specification to define the partitionning of masters
# base on the most recent created
window_spec = Window \
.partitionBy(raw_df.master_id) \
.orderBy(raw_df.created.desc())
# Create a dataframe with a row number for each partition to process
# raws grouped with their master from the most recent to the least
partionned_df = raw_df \
.select(
fields + [
F.row_number().over(window_spec).alias('row_nb')
]
) \
.orderBy(['master_id', 'row_nb'])
This is my dataframe :
partionned_df.show()
+------+---+---------------+---------+------------+-------------+------+
|master|raw| created|last_name|mobile_phone|norm_civility|row_nb|
+------+---+---------------+---------+------------+-------------+------+
| 1| 3|02-05 11:22:...| null| 641203047| null| 1|
| 1| 5|02-05 11:22:...| null| null| monsieur| 2|
| 1| 6|02-05 11:22:...| null| null| null| 3|
| 1| 1|02-05 10:29:...| name1| null| mr| 4|
| 3| 11|02-05 11:26:...| null| null| monsieur| 1|
| 3| 4|02-05 10:30:...| null| 630254175| null| 2|
| 5| 15|02-05 11:35:...| null| null| null| 1|
| 5| 13|02-05 11:35:...| null| null| null| 2|
| 5| 14|02-05 11:33:...| name3 | null| monsieur| 3|
| 5| 7|02-05 11:22:...| null| null| null| 4|
+---+------+---------------+---------+------------+-------------+------+
What i want is to create a new dataframe with only distinct masters and new columns after iterated on the row_number of each partition and execute some conditions (like if first_name is not null, etc..) and if the conditions are not verified on the first row_nb of the partition, check the other one etc until i looped every row number of the partition.
The final dataframe i want is something like :
+------+---------+------------+-------------+
|master|last_name|mobile_phone|norm_civility|
+------+---------+------------+-------------+
| 1| name1| 641203047| monsieur |
| 3| null| 630254175| monsieur |
| 5| name3| null| monsieur |
+------+---------+------------+-------------+
Thank you very much for your help by advance.