Does anybody know why is this happening?
and when I filter it:
EDIT: This is how I added the last two columns. It seems to me that because I used pandas_udf to generate the last two columns, something goes crazy, whereas I can filter the first four columns without any trouble, which I constructed using plain udf.
@pandas_udf('string', PandasUDFType.SCALAR)
def blocking(ids,x,y):
....
return pd.Series(final)
df4 = df3.withColumn('blocking_index', \
blocking(df3.id,df3.ratepayer,df3.CharityName))