0

I am having problems to make this trial code work. The final line df.select(plus_one(col("x"))).show() doesn't work, I also tried to save in a variable ( vardf = df.select(plus_one(col("x"))) followed by vardf.show() and fails too.

import pyspark
import pandas as pd
from typing import Iterator
from pyspark.sql.functions import col, pandas_udf, struct
spark = pyspark.sql.SparkSession.builder.getOrCreate()
spark.sparkContext.setLogLevel("WARN")

pdf = pd.DataFrame([1, 2, 3], columns=["x"])
df = spark.createDataFrame(pdf)
df.show()

@pandas_udf("long")
def plus_one(batch_iter: Iterator[pd.Series]) -> Iterator[pd.Series]:
    for s in batch_iter:
        yield s + 1

df.select(plus_one(col("x"))).show()

Error message (parts of it): File "C:\bigdatasetup\anaconda3\envs\pyspark-env\lib\site-packages\spyder_kernels\py3compat.py", line 356, in compat_exec exec(code, globals, locals)

File "c:\bigdatasetup\dataanalysiswithpythonandpyspark-trunk\code\ch09\untitled0.py", line 24, in df.select(plus_one(col("x"))).show()

File "C:\bigdatasetup\anaconda3\envs\pyspark-env\lib\site-packages\pyspark\sql\dataframe.py", line 494, in show print(self._jdf.showString(n, 20, vertical))

File "C:\bigdatasetup\anaconda3\envs\pyspark-env\lib\site-packages\py4j\java_gateway.py", line 1321, in call return_value = get_return_value(

File "C:\bigdatasetup\anaconda3\envs\pyspark-env\lib\site-packages\pyspark\sql\utils.py", line 117, in deco raise converted from None

PythonException: An exception was thrown from the Python worker. Please see the stack trace below. ... ... ERROR 2022-04-21 09:48:24,423 7608 org.apache.spark.scheduler.TaskSetManager [task-result-getter-0] Task 0 in stage 3.0 failed 1 times; aborting job

  • I just solved it, just checking paths and variables in User and System Environment Variables. – Paul Villagra Apr 22 '22 at 03:24
  • Hi Paul, if you solved it please answer your own question with exactly how you solved it so other people can fix it when they come across this issue. – Jack Cole Oct 18 '22 at 15:06
  • Hi Jack, sorry it was many months ago and I do not recall by now the details, just that I checked the environmental variables and path, and it worked. – Paul Villagra Oct 19 '22 at 17:20

0 Answers0