0

I am trying to return a StructField from a Pandas UDF in Pyspark used with aggregation with the following function signature:

def parcel_to_polygon(geom:pd.Series,entity_ids:pd.Series) -> Tuple[int,str,List[List[str]]]:

But it turns out that the return type is not supported. Is there an alternative way to achieve the same. I can make three Pandas udf and return the primitive types and that works, but the function logic is repeated in those three functions which is what I am trying to avoid(assuming it will be a bit more performant, maybe I'm wrong here).

Tarique
  • 463
  • 3
  • 16

1 Answers1

0

you can return all the values as a dataframe like this

schema = StructType([
    StructField('X', DoubleType()), 
    StructField('Y', DoubleType()),
]) 


@pandas_udf(schema)  
def polygon(Logitude,Latitude):
   return pd.DataFrame({"X":Longitude,"Y",Latitude}) .
code_bug
  • 355
  • 1
  • 12
  • Will I be a able to use this pandas UDF in group by? – Tarique Nov 17 '22 at 09:33
  • 1
    you cannot use this exactly for the function the input will be dataframe if you are using group by. check this documentation https://learn.microsoft.com/en-us/azure/databricks/udf/pandas – code_bug Nov 17 '22 at 10:20