I would like to create a Pandas UDF that returns a series containing a list of lists. This list represents the normalized pixel values of an image. Is it possible to return this datatype from a Pandas UDF? I tried adding ArrayType(FloatType())
to the Pandas UDF decorator, but am getting the error: could not convert <nested lists with floating point values> with type list: tried to convert to float 32
. Would be great to hear your thoughts on this, thanks!
@pandas_udf(ArrayType(FloatType()))
def base64_to_arr(base64_images: pd.Series) -> pd.Series:
def base64_to_arr(img):
img_bytes = base64.b64decode(img)
img_array = np.load(BytesIO(img_bytes))
## Resize
pil_img = Image.fromarray(img_array)
resized_img = pil_img.resize((32, 32))
resized_image_arr = np.array(resized_img)
##
normalized_img = resized_image_arr.astype("float32") / 255
formatted_img = normalized_img.tolist()
return formatted_img
arr_images = base64_images.map(base64_to_arr)
return arr_images