Are there any recommended methods for implementing custom sort ordering for categorical data in pyspark? I'm ideally looking for the functionality the pandas categorical data type offers.
So, given a dataset with a Speed
column, the possible options are ["Super Fast", "Fast", "Medium", "Slow"]
. I want to implement custom sorting that will fit the context.
If I use the default sorting the categories will be sorted alphabetically. Pandas allows to change the column data type to be categorical and part of the definition gives a custom sort order: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Categorical.html