1

i have following problem while using udfs in pyspark.

As long as I don't use any udfs my code works well. There are no problems with performing simple operations like selecting columns, or using sql functions like concat. As soon as I perform action on DataFrame that uses udf, program crash with following exception:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/jars/spark-unsafe_2.11-2.4.3.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
19/06/05 09:24:37 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
  File "/Users/szymonk/Desktop/Projects/SparkTest/Application.py", line 59, in <module>
    transformations.select(udf_example(col("gender")).alias("udf_example")).show()
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/sql/dataframe.py", line 378, in show
    print(self._jdf.showString(n, 20, vertical))
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/Users/szymonk/Desktop/Projects/SparkTest/venv/lib/python2.7/site-packages/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u'Unsupported class file major version 55'

I've tried changing JAVA_HOME as proposed in: Pyspark error - Unsupported class file major version 55 but it didn't help.

There is nothing fancy in my code. I am only defining a simple udf function that should return lenght of values inside column "Gender"

from pprint import pprint
from pyspark.sql import SparkSession, Column
from pyspark.sql.functions import col, lit, struct, array, udf, concat, trim, when
from pyspark.sql.types import IntegerType

transformations = spark.read.csv("Resources/PersonalData.csv", header=True)

udf_example = udf(lambda x: len(x))
transformations.select(udf_example(col("gender")).alias("udf_example")).show()

I'm not sure if it is significant but i'm using Pycharm on Mac.

Pan Wolodyjowsky
  • 388
  • 6
  • 26

2 Answers2

1

I found solution, i had to switch boot jdk of Pycharm (2xshift -> jdk -> select jdk 1.8)

Pan Wolodyjowsky
  • 388
  • 6
  • 26
0

I just switched from pySpark 2.4.7 back to 2.4.2 and it worked both with python 3.6 and 3.7

CAV
  • 129
  • 2
  • 2