I want to process a large number of pickled data with Hadoop using Python. What I am trying to do is represent my data as some key (file id) and compressed pickle as value in a large file.
If I simply try to put binary code as ascii in the file which I want to process with hadoop I am getting a lot of '\t' and '\n' values which interfere with (key, value) structure of hadoop file.
My question is: how can I compress some data using python and represent it as a string in an ascii file avoiding certain characters (such as '\t' and '\n')?
Or maybe my approach is inherently invalid?
I would really appreciate any help!