I am writing serialized Thrift records to a file using Elephant Bird's splittable LZO compression. To achieve this I am using their ThriftBlockWriter
class. My Scalding job then uses the FixedPathLzoThrift source to process the records. This all works fine. The problem is that I am limited to records of a single Thrift class.
I want to start using RawBlockWriter
instead of ThriftBlockWriter[MyThriftClass]
. So instead of LZO-compressed Thrift records, my input will be LZO-compressed raw byte arrays. My question is: what should I use instead of FixedPathLzoThrift[MyThriftClass]
?
Explanation of "protocol-buffers" tag: Elephant Bird uses a Protocol Buffers SerializedBlock
class to wrap the raw input, as seen here.