1

Trying to navigate the waters between python and pyspark, I have a function for rotating images written in python:

from PIL import Image
def rotate_image(image, rotation_angle):
  im = Image.open(image)
  out = im.rotate(rotation_angle, expand = True)
  return out

I now want to use this function as a pyspark udf.

Reading in image:

import pyspark.sql.functions as fn

df = spark.read.format("image").load("image2.jpg")
df.printSchema()
_udf = fn.udf(rotate_image) ##Not sure what type to declare the image as here

At this point, I want to apply the function to an image and declare specified angle:

 df.withColumn('x1', _udf(array('image', 90))).show() #Not really sure what I am doing here

I got an error saying

TypeError: Invalid argument, not a string or column: 90 of type . For column literals, use 'lit', 'array', 'struct' or 'create_map' function.
Mikee
  • 783
  • 1
  • 6
  • 18
  • @Steven, not working. I suspect it's because the definition of the function _udf is without the datatype definition of the inputs – Mikee Apr 25 '23 at 16:03

0 Answers0