I am reading from a file from hadoop. The file is delimited by CTRL+B character. I must load these files to vertica database.
I am reading hadoop files line by line and inserts the records using prepared statements. Some of the records in the files contains new line(SOH control character) in their values(eg; address, comments). I am using readLine() on the BufferedReader object to hadoop file. So, it rejects some of the records containing newlines in their columns. It reads these records as two records missing some column values. This causes database to reject them.
The File is delimited by STX(CTRL+B). The columns values containing new lines are having SOH(CTRL+A) inside them. My question is how to read these lines as such. Any help?