1

I would like to create a Pandas UDF that returns a series containing a list of lists. This list represents the normalized pixel values of an image. Is it possible to return this datatype from a Pandas UDF? I tried adding ArrayType(FloatType()) to the Pandas UDF decorator, but am getting the error: could not convert <nested lists with floating point values> with type list: tried to convert to float 32. Would be great to hear your thoughts on this, thanks!

@pandas_udf(ArrayType(FloatType()))
    def base64_to_arr(base64_images: pd.Series) -> pd.Series:
        def base64_to_arr(img):
            img_bytes = base64.b64decode(img)
            img_array = np.load(BytesIO(img_bytes))
            ## Resize
            pil_img = Image.fromarray(img_array)
            resized_img = pil_img.resize((32, 32))
            resized_image_arr = np.array(resized_img)
            ##
            normalized_img = resized_image_arr.astype("float32") / 255
            formatted_img = normalized_img.tolist()
            return formatted_img
        arr_images = base64_images.map(base64_to_arr)
        return arr_images

0 Answers0