My data are available as sets of python 3 pickled files. Most of them are serialization of pandas data frames.
I'd like to start using spark because I need more memory and CPU that one computer can have. Also, I'll use HDFS for distributed storage.
As a beginner, I didn't found relevant information explaining how to use pickle files as input file.
Does it exists? If not, are there any workaround?
Thanks a lot