I have a scenario where I have a dataframe which contains say three columns, and next to each row in that dataframe I need to generate an ID. Unfortuantly I can't just use the UUID module which would make this easy and it has to be 6 characters in length.
I have found a solution here, fixed length ID, which solves that.
The issue I am facing though is that I don't know how to now iterate through the rows in the dataframe to create the new column. I've been trying a for loop but when it reaches the end it results in errors such as no append on dataframe etc.
I'm still fairly new to both Python and PySpark and would appreciate any pointers in the right direction for me to research to help me get moving again as currently I'm not sure how to progress.
Thanks in advance.