I want to load some files in binary format (for example jpegs, but could be any binary format), manipulate it somehow and write it back. I want to do that on hadoop, and I would like to write it over Cascading framework.
Are there binary sinks / tabs I can use for binary formatted files? Any other way to do that?
I couldn't find anything. The only alternative I could think of is maybe I should implementing my own hadoop InputFormat that will read the files as byte array or a java ByteBuffer, but I find it weird that there isn't a built in solution (because I'm sure I'm not the first one who encountered this issue).
If anyone has any pointers it will be highly appreciated