I have pyspark dataframe that I want to convert into a pandas dataframe, however I have an array of json column that gets converted into a string in pandas
my_df = (
spark
.createDataFrame(
pd.DataFrame([['Scott', 50], ['Jeff', 45], ['Thomas', 54], ['Ann',34]], columns=['id', 'score']))
)
my_df
.groupBy('id')
.agg(
F.to_json(
F.sort_array(
F.collect_list(F.col('score')), asc=False
)
).alias('preds')
).toPandas
I need to execute: mypandasdf['preds'] = mypandasdf.preds.apply(eval)
to cast my string column to a list of dict.
I was wondering if there was any more efficient way to do it.
Any help? Thanks.