How can I SERDE to build generic file ingestion into Hive?

Question

I need to build generic file ingestion into Hive. The files are very large (2GB+), can be fixed or comma-separated, ASCII or EBCDIC files. After trying various techniques using Talend, I am looking into SERDE. If I ingest the files as-is and use a schema file (containing ordinal position, column name, type, length), can I create a custom SERDE to de-serialize any input file into hive rows? How performant would it be?

score 0 · Answer 1 · answered Jun 18 '18 at 19:15

0

Since asking this question, I found that I could use a COBOL custom SERDE. I am also looking at regex SERDE for positional files.

answered Jun 18 '18 at 19:15

Ranga Nathan

81
1
4

How can I SERDE to build generic file ingestion into Hive?

1 Answers1