Pyspark- handling exceptions and raising RuntimeError in pyspark dataframe

Question

I have a dataframe in which i'm trying to create a new column based on values of existing column:

dfg = dfg.withColumn("min_time",
    F.when(dfg['list'].isin(["A","B"]),dfg['b_time'])
     .when(dfg['list']=="C",dfg['b_time'] +2)
     .when(dfg['list']=="D",F.when(dfg['b_time']==0,lit(10)).otherwise(2*dfg['b_time'])).when(dfg['list'].isin(['E','F']),dfg['b_time']).when(dfg['list'].isin(["A","B","C","D",'E','F'])==False,lit('unknown category'))
     .otherwise('unknown'))

What I want to achieve in the last .when condition is if the column dfg['list'] values do not belong to one of the elements in the list = ["A","B","C","D",'E','F'], i want to raise a Runtime Error with a message. Unsure as to how do this in pyspark. Also If I'm creating columns based on conditional statements i.e., .when and .otherwise, how to use try except blocks.

I'm using pyspark 1.6. Any help is much appreciated.

score 0 · Answer 1 · answered Jan 31 '18 at 20:36

This is not the way to go. Building control flow is against both SQL model and functional model used with Spark.

If you really want to brake check if there is any "unknown" a exit gracefully:

if not dfg.where(dfg["min_time"] == "unknown").take(1):
    ... # Add your logic here

Pyspark- handling exceptions and raising RuntimeError in pyspark dataframe

1 Answers1