I want to generate sequential unique id to a data frame that is subject to change. When i say change it means that more number of rows will be added tomorrow after i generate the ids today. when more rows are added i want to look up the id column which has the generated ids and increment for the newly added data
+-------+--------------------+-------------+
|deal_id| deal_name|Unique_id |
+-------+--------------------+--------------
| 613760|ABCDEFGHI | 1|
| 613740|TEST123 | 2|
| 598946|OMG | 3|
Say if i get more data tomorrow i want to append the same to this data frame and the unique id should increment to 4 and go on.
+-------+--------------------+-------------+
|deal_id| deal_name|Unique_id |
+-------+--------------------+--------------
| 613760|ABCDEFGHI | 1|
| 613740|TEST123 | 2|
| 598946|OMG | 3|
| 591234|OM21 | 4|
| 988217|Otres | 5|
.
.
.
Code Snippet
deals_df_final = deals_df.withColumn("Unique_id",F.monotonically_increasing_id())
But this didnt give sequential ID.
I can try row_num and RDD zip with index but looks like the dataframe will be immutable.
Any help please? I want to be able to generate and also increment the id as and when data is added.