0

I want to process a large number of pickled data with Hadoop using Python. What I am trying to do is represent my data as some key (file id) and compressed pickle as value in a large file.

If I simply try to put binary code as ascii in the file which I want to process with hadoop I am getting a lot of '\t' and '\n' values which interfere with (key, value) structure of hadoop file.

My question is: how can I compress some data using python and represent it as a string in an ascii file avoiding certain characters (such as '\t' and '\n')?

Or maybe my approach is inherently invalid?

I would really appreciate any help!

twowo
  • 621
  • 1
  • 8
  • 15

2 Answers2

0

For compression you could use the zlib or bz2 modules. For representation you can use the base64 module.

Roland Smith
  • 42,427
  • 3
  • 64
  • 94
0

You could convert the pickled object to base64 using the base64 module.

kindall
  • 178,883
  • 35
  • 278
  • 309