I need to build generic file ingestion into Hive. The files are very large (2GB+), can be fixed or comma-separated, ASCII or EBCDIC files. After trying various techniques using Talend, I am looking into SERDE. If I ingest the files as-is and use a schema file (containing ordinal position, column name, type, length), can I create a custom SERDE to de-serialize any input file into hive rows? How performant would it be?
Asked
Active
Viewed 54 times
1 Answers
0
Since asking this question, I found that I could use a COBOL custom SERDE. I am also looking at regex SERDE for positional files.

Ranga Nathan
- 81
- 1
- 4