0

I want to load some files in binary format (for example jpegs, but could be any binary format), manipulate it somehow and write it back. I want to do that on hadoop, and I would like to write it over Cascading framework.

Are there binary sinks / tabs I can use for binary formatted files? Any other way to do that?

I couldn't find anything. The only alternative I could think of is maybe I should implementing my own hadoop InputFormat that will read the files as byte array or a java ByteBuffer, but I find it weird that there isn't a built in solution (because I'm sure I'm not the first one who encountered this issue).

If anyone has any pointers it will be highly appreciated

polo
  • 1,352
  • 2
  • 16
  • 35

1 Answers1

2

You will have to write your own Hadoop InputFormat to process your binary data and then wrap that InputFormat in a custom Cascading Scheme. On the bright side, you do not need a custom Tap.

This all comes from the Cascading author himself.

Engineiro
  • 1,146
  • 7
  • 10