Deserialize protobuf column with Hive

Question

I am really new to Hive, I apologize if there are any misconceptions in my question.

I need to read a hadoop Sequence File into a Hive table, the sequence file is thrift binary data, which could be deserialized using SerDe2 that comes with Hive.

The problem now is: One column in the file is encoded with Google protobuf, so when thrift SerDe processes the sequence file it does not process the protobuf encoded column properly.

I wonder if there's a way in Hive to deal with this kind of protobuf encoded columns that are nested inside a thrift sequence file, so that each column could be parsed properly?

Thank you so much for any possible help!

score 0 · Answer 1 · answered Nov 07 '16 at 07:51

0

I believe you should use some other serde to deserialize the proto buff format,

may be you can refer this,

https://github.com/twitter/elephant-bird/wiki/How-to-use-Elephant-Bird-with-Hive

answered Nov 07 '16 at 07:51

Sathiyan S

1,013
6
13

Deserialize protobuf column with Hive

1 Answers1