0

My data are available as sets of python 3 pickled files. Most of them are serialization of pandas data frames.

I'd like to start using spark because I need more memory and CPU that one computer can have. Also, I'll use HDFS for distributed storage.

As a beginner, I didn't found relevant information explaining how to use pickle files as input file.

Does it exists? If not, are there any workaround?

Thanks a lot

Michael Hooreman
  • 582
  • 1
  • 5
  • 16

0 Answers0