I have the following pyspark dataframe. As shown in the picture below I need to create column Value2 from column Value1. The issue is that for all the rows with the same value of time, the value in column Value2 needs to be the first value corresponding to this time in Value1. Therefore, if you look at the picture for all the rows for which the time is 16:07 the value needs to be 100. I know I need to use some form of conditional but I could not work out how to do it so I was wondering if someone could point me in the right direction.
data={
‘Name’:[‘John’,’Kate’,’William’,’Andrew’,’Nicole’,’Nicola’,’Gavin’,’Gabriel’, ‘John’,’Kate’,’William’,’Andrew’,’Nicole’,’Nicola’,’Gavin’,’Gabriel’],
’Value1’:[10,20,50,60,100,200,250,300,450,520,570,600,630,670,690,700,720],
’Time’:[‘ 15/06/2020 16:05:00’, ‘ 15/06/2020 16:05:00’, ‘ 15/06/2020 16:05:00’, ‘ 15/06/2020 16:06:00’, ‘ 15/06/2020 16:07:00’, ‘ 15/06/2020 16:07:00’, ‘ 15/06/2020 16:08:00’, ‘ 15/06/2020 16:09:00’, ‘ 15/06/2020 16:10:00’, ‘ 15/06/2020 17:20:00’, ‘ 15/06/2020 17:21:00’, ‘ 15/06/2020 17:22:00’, ‘ 15/06/2020 17:22:00’, ‘ 15/06/2020 17:22:00’, ‘ 15/06/2020 17:22:00’, , ‘ 15/06/2020 17:25:00’, , ‘ 15/06/2020 17:26:00’}
df=pd.DataFrame(data)
df_spark=spark.createDataFrame(df)