I'm using Spark v2.4.0 and I occurred a strange phenomenon:
I have rather simple dataframe function and use it on some dataframe called "new":
from product_mapping import product_mapping
new2 = product_mapping(new)
new2.show()
product mapping (it's a separate python script due to length of the statement)
import pyspark.sql.functions as F
def product_mapping(df):
df = df.withColumn('PRODUCT', F.when((df.var1 == "301") & (df.var2 == 0) & (df.var3 == 30), F.lit('101'))
.when((df.var1 == "301") & (df.var2 == 1) & (df.var3 == 30), F.lit('102'))
.when((df.var1 == "302") & (df.var2 == 0) & (df.var3 == 31), F.lit('103'))
.when((df.var1 == "302") & (df.var2 == 1) & (df.var3 == 31), F.lit('104'))
.when((df.var1 == "303") & (df.var2 == 0) & (df.var3 == 61), F.lit('105'))
.when((df.var1 == "303") & (df.var2 == 0) & (df.var3 == 32), F.lit('106'))
.when((df.var1 == "303") & (df.var2 == 1) & (df.var3 == 32), F.lit('107'))
.when((df.var1 == "303") & (df.var2 == 1) & (df.var3 == 61), F.lit('108'))
.when((df.var1 == "304") & (df.var2 == 0) & (df.var3 == 69), F.lit('109'))
(many more WHEN lines)
.when((df.var1 == "304") & (df.var2 == 1) & (df.var3 == 69), F.lit('205')))
return df
In total I have some > 150 lines, but the code does not seem to work; it throws up error:
py4j.protocol.Py4JJavaError: An error occurred while calling o1754.showString.
: java.lang.StackOverflowError
at org.codehaus.janino.CodeContext.extract16BitValue(CodeContext.java:720)
at org.codehaus.janino.CodeContext.flowAnalysis(CodeContext.java:561)
However, when I shorten the statement to let's say 5 WHEN statements, the code works fine ... so is there a max number of WHEN statements to use? And how to overcome this?
Thanks