-1

I have a python project that uses pyspark and i am trying to define a udf function inside the spark project (not in my python project) specifically in spark\python\pyspark\ml\tuning.py but i get pickling problems. it can't load the udf. The code:

from pyspark.sql.functions import udf, log
test_udf = udf(lambda x : -x[1], returnType=FloatType())
d = data.withColumn("new_col", test_udf(data["x"]))
d.show()

when i try d.show() i am getting exception of unknown attribute test_udf

In my python project i defined many udf and it worked fine.

ofer-a
  • 521
  • 5
  • 21

2 Answers2

0

add the following to your code. It isn't recognizing the datatype.

from pyspark.sql.types import *

Let me know if this helps. Thanks.

avrsanjay
  • 805
  • 7
  • 12
0

Found it there was 2 problems

1) for some reason it didn't like the returnType=FloatType() i needed to convert it to just FloatType() though this was the signature

2) The data in column x was a vector and for some reason i had to cast it to float

The working code:

from pyspark.sql.functions import udf, log
test_udf = udf(lambda x : -float(x[1]), FloatType())
d = data.withColumn("new_col", test_udf(data["x"]))
d.show()
ofer-a
  • 521
  • 5
  • 21