I have a dataframe in which i'm trying to create a new column based on values of existing column:
dfg = dfg.withColumn("min_time",
F.when(dfg['list'].isin(["A","B"]),dfg['b_time'])
.when(dfg['list']=="C",dfg['b_time'] +2)
.when(dfg['list']=="D",F.when(dfg['b_time']==0,lit(10)).otherwise(2*dfg['b_time'])).when(dfg['list'].isin(['E','F']),dfg['b_time']).when(dfg['list'].isin(["A","B","C","D",'E','F'])==False,lit('unknown category'))
.otherwise('unknown'))
What I want to achieve in the last .when condition is if the column dfg['list'] values do not belong to one of the elements in the list = ["A","B","C","D",'E','F'], i want to raise a Runtime Error with a message. Unsure as to how do this in pyspark. Also If I'm creating columns based on conditional statements i.e., .when and .otherwise, how to use try except blocks.
I'm using pyspark 1.6. Any help is much appreciated.