0
from pyspark.sql import SparkSession
from pyspark.sql.functions import to_date,month

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
data = [("James","Smith","USA","2019-06-24 12:01:19.000"),
    ("Michael","Rose","USA","2019-06-24 12:01:19.000"),
    ("Robert","Williams","USA","2019-06-24 12:01:19.000"),
    ("Maria","Jones","USA","2019-06-24 12:01:19.000")
  ]
columns = ["firstname","lastname","country","datetime_column"]
df = spark.createDataFrame(data = data, schema = columns)

df = df.withColumn('month', month(to_date(df['datetime_column'])).cast('int'))
df.show()

I have this code on pyspark, but when I try to run it will give me an error like this:

WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Python bulunamad22/12/31 15:22:05 WARN ProcfsMetricsGetter: Exception when trying to compute pagesize, as a result reporting of ProcessTree metrics is stopped
22/12/31 15:22:14 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.SparkException: Python worker failed to connect back.

ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "C:/Users/User/PycharmProjects/tasks/main.py", line 15, in <module>
    df.show()
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark\sql\dataframe.py", line 606, in show
    print(self._jdf.showString(n, 20, vertical))
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\py4j\java_gateway.py", line 1322, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\pyspark\sql\utils.py", line 190, in deco
    return f(*a, **kw)
  File "C:\Users\User\AppData\Local\Programs\Python\Python37\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o45.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage

My java version is 11 and pyspark is 3.3.1 Cant think of anything and stuck here, any help would be appreciated.

Karavana
  • 489
  • 5
  • 19

0 Answers0