Pandas UDF Structfield return

Question

I am trying to return a StructField from a Pandas UDF in Pyspark used with aggregation with the following function signature:

def parcel_to_polygon(geom:pd.Series,entity_ids:pd.Series) -> Tuple[int,str,List[List[str]]]:

But it turns out that the return type is not supported. Is there an alternative way to achieve the same. I can make three Pandas udf and return the primitive types and that works, but the function logic is repeated in those three functions which is what I am trying to avoid(assuming it will be a bit more performant, maybe I'm wrong here).

score 0 · Answer 1 · answered Nov 15 '22 at 15:03

0

you can return all the values as a dataframe like this

schema = StructType([
    StructField('X', DoubleType()), 
    StructField('Y', DoubleType()),
]) 


@pandas_udf(schema)  
def polygon(Logitude,Latitude):
   return pd.DataFrame({"X":Longitude,"Y",Latitude}) .

answered Nov 15 '22 at 15:03

code_bug

355
1
12

Will I be a able to use this pandas UDF in group by? – Tarique Nov 17 '22 at 09:33
1

you cannot use this exactly for the function the input will be dataframe if you are using group by. check this documentation https://learn.microsoft.com/en-us/azure/databricks/udf/pandas – code_bug Nov 17 '22 at 10:20

Pandas UDF Structfield return

1 Answers1