Ascii represantion of compressed data without certain character

Question

I want to process a large number of pickled data with Hadoop using Python. What I am trying to do is represent my data as some key (file id) and compressed pickle as value in a large file.

If I simply try to put binary code as ascii in the file which I want to process with hadoop I am getting a lot of '\t' and '\n' values which interfere with (key, value) structure of hadoop file.

My question is: how can I compress some data using python and represent it as a string in an ascii file avoiding certain characters (such as '\t' and '\n')?

Or maybe my approach is inherently invalid?

I would really appreciate any help!

score 0 · Answer 1 · answered Aug 22 '12 at 18:46

0

For compression you could use the zlib or bz2 modules. For representation you can use the base64 module.

answered Aug 22 '12 at 18:46

Roland Smith

42,427
3
64
94

score 0 · Accepted Answer · answered Aug 22 '12 at 18:46

0

You could convert the pickled object to base64 using the base64 module.

answered Aug 22 '12 at 18:46

kindall

178,883
35
278
309

Ascii represantion of compressed data without certain character

2 Answers2