Pyspark reading pickled files

Asked Mar 26 '16 at 04:10

Active Mar 26 '16 at 20:54

Viewed 546 times

My data are available as sets of python 3 pickled files. Most of them are serialization of pandas data frames.

I'd like to start using spark because I need more memory and CPU that one computer can have. Also, I'll use HDFS for distributed storage.

As a beginner, I didn't found relevant information explaining how to use pickle files as input file.

Does it exists? If not, are there any workaround?

Thanks a lot

asked Mar 26 '16 at 04:10

Michael Hooreman

This question belongs on stackoverflow. – eliasah Mar 26 '16 at 08:31
I'm voting to close this question as off-topic because it belongs on SO – Dawny33 Mar 26 '16 at 18:34
Indeed, you are both right. I've tried to close is for that reason, but I didn't succeed – Michael Hooreman Mar 26 '16 at 20:49
@eliasah True: the entry you mention is this one, transferred to another forum. Please see previous comments. – Michael Hooreman Apr 01 '16 at 22:07
If the answer to that question answers this question too, would you care to delete this one please ? – eliasah Apr 01 '16 at 22:08
As shown in the comments before, I didn't found how to delete this. – Michael Hooreman Apr 01 '16 at 22:09

0 Answers0