Question is pretty simple, looking for a way to safely and optimally load very large csv data ( > 200gb ) to Teradata DB. Due to storage restrictions, we have kept the data file in hdfs and need it loaded into a Teradata table. Chopping or splitting the csv to smaller csv is possible but will probably consider it as the last resort in which case any option will work.
Possible solutions that was already tried:-
1. Sqoop export: Failing due to resources despite pushing maximum number of mappers.
2. Nifi flow: getHDFS > SplitText > SplitText..... > CSVtoAvro > PutDatabaseRecord.
But processor seems to hang due to memory issues I feel.
Need some way to perhaps split the file into smaller files and insert batches of 250000 into TD ?
Any help would be appreciated.