We recently received a file to be ingested, the file is PSV format, however, all the fields are padded with extra characters $~$ on the left and right, so the entire PSV is like below:
$~$Field1$~$|$~$Field2$~$|$~$Field3$~$
$~$Data1$~$|$~$Data2$~$|$~$Data3$~$
$~$Data4$~$|$~$Data5$~$|$~$Data6$~$
$~$Data7$~$|$~$Data8$~$|$~$Data9$~$
$~$Data10$~$|$~$Data11$~$|$~$Data12$~$ .....
There are 100 Million rows in the file.
What would be the best way to trim these paddings so as to make it a standard PSV?
Thank you very much, any suggestion/sharing is appreciated here.
UPDATE:
The data is received from SFTP, and uploaded to Hadoop by IT Data Support (Unix Admin), we only have access to Hadoop cluster, but if it is an easy job for Data Support, maybe I can convince them to do the preprocess. Thanks.