I have this large dataset around 6gb and have processed and cleaned the data using PySpark and now want to save it so I can use it elsewhere for machine learning uses
I am trying to find the fastest way of saving the datasets. I followed this link, but its taking so long to save the csv or the parquet. How to export a table dataframe in PySpark to csv?
Please can someone provide some information on how I can do this