How can I read and write binary files in Cascading?

Question

I want to load some files in binary format (for example jpegs, but could be any binary format), manipulate it somehow and write it back. I want to do that on hadoop, and I would like to write it over Cascading framework.

Are there binary sinks / tabs I can use for binary formatted files? Any other way to do that?

I couldn't find anything. The only alternative I could think of is maybe I should implementing my own hadoop InputFormat that will read the files as byte array or a java ByteBuffer, but I find it weird that there isn't a built in solution (because I'm sure I'm not the first one who encountered this issue).

If anyone has any pointers it will be highly appreciated

score 2 · Answer 1 · answered Jul 19 '13 at 17:24

2

You will have to write your own Hadoop InputFormat to process your binary data and then wrap that InputFormat in a custom Cascading Scheme. On the bright side, you do not need a custom Tap.

This all comes from the Cascading author himself.

answered Jul 19 '13 at 17:24

Engineiro

1,146
7
10

How can I read and write binary files in Cascading?

1 Answers1