1

Below is the T-SQL code attached. I tried to convert it to pyspark using window functions which is also attached.

case 
           when eventaction = 'IN' and lead(eventaction,1) over (PARTITION BY barcode order by barcode,eventdate,transactionid) in('IN','OUT') 
                then                   lead(eventaction,1) over (PARTITION BY barcode order by barcode,eventdate,transactionid) 
           else ''
      end as next_action

Pyspark code giving error using window function lead

Tgt_df = Tgt_df.withColumn((('Lead', lead('eventaction').over(Window.partitionBy("barcode").orderBy("barcode","transactionid", "eventdate")) == 'IN' )|
                    ('1', lead('eventaction').over(Window.partitionBy("barcode").orderBy("barcode","transactionid", "eventdate")) == 'OUT')
                     , (lead('eventaction').over(Window.partitionBy("barcode").orderBy("barcode","transactionid", "eventdate"))).otherwise('').alias("next_action")))

But it's not working. What to do!?

Katelyn Raphael
  • 253
  • 2
  • 4
  • 16

1 Answers1

0

The withColumn method should be used as df.withColumn('name_of_col', value_of_column), that's why you have an error.

From your T-SQL requests, the corresponding pyspark code should be :

import pyspark.sql.functions as F
from pyspark.sql.window import Window

w = Window.partitionBy("barcode").orderBy("barcode","transactionid", "eventdate")

Tgt_df = Tgt_df.withColumn('next_action',
                           F.when((F.col('event_action')=='IN')&(F.lead('event_action', 1).over(w).isin(['IN', 'OUT'])),
                                  F.lead('event_action', 1).over(w)
                                  ).otherwise('')
                           )
Xavier Canton
  • 216
  • 2
  • 6