I have a pyspark dataframe as follows in the picture:
I.e. i have four columns: year, word, count, frequency. The year is from 2000 to 2015.
I could like to have some operation on the (pyspark) dataframe so that i get the result in a format as the following picture:
The new dataframe column should be : word, frequency_2000, frequency_2001, frequency_2002, ..., frequency_2015.
With the frequency of each word in each year coming from previous dataframe.
Any advice how I could write efficient code?
Also, please rename the title if you could come up some more informative.