0

Are there any pointers to get Scalding to work with LZO Protobuf data on HDFS?

I am trying to read files that are stored in binary Protobuf and compressed in LZO using Scalding. Can we use Elephantbird to read those files? Any pointers will be appreciated!

I have looked at the LzoTraits and LzoProtobufScheme? But I am not sure how I should be using it to read the data? Any examples would be great!

thinker25
  • 1
  • 2

1 Answers1

1

Here is an example:

case class SomeProto() extends FixedPathSource("/my/greatData/*")
  with LzoProtobuf[MyProtoClassHere] {
    override def column = classOf[MyProtoClassHere]
}

You can mix with other types of abstract base Sources (like TimePathedSource, or MostRecentGoodSource) in a similar way. You can mix in with LocalTapSource if you want to use the Hadoop-inside-cascading-local trick (if you don't run in cascading local mode, you don't need this).

Oscar Boykin
  • 1,974
  • 2
  • 11
  • 16