I originally used the below code to work with a standard pandas df. Switched to pyspark pandas df once data grew. I've been unable to make this groupby work on the pyspark pandas df. I've also tried to replicate on a spark df using spark functions, but my knowledge there is limited so I haven't had any luck. Any tips/advice would be much appreciated.
df1 = df1.groupby(['FISCAL_YEAR', 'FISCAL_MONTH', 'FISCAL_WEEK','ORDER_NUMBER', 'LINE_TYPE'], as_index=False).agg({
'DEPARTMENT': lambda x: ' | '.join(sorted(x.unique()))
,'Dept_Subdept': lambda x: ' | '.join(sorted(x.unique()))
,'Demand': 'sum'
,'COGS': 'sum'
,'Units':'sum'
})
ValueError: aggs must be a dict mapping from column name to aggregate functions (string or list of strings).