-1

I am reading from a file from hadoop. The file is delimited by CTRL+B character. I must load these files to vertica database.

I am reading hadoop files line by line and inserts the records using prepared statements. Some of the records in the files contains new line(SOH control character) in their values(eg; address, comments). I am using readLine() on the BufferedReader object to hadoop file. So, it rejects some of the records containing newlines in their columns. It reads these records as two records missing some column values. This causes database to reject them.

The File is delimited by STX(CTRL+B). The columns values containing new lines are having SOH(CTRL+A) inside them. My question is how to read these lines as such. Any help?

Santhosh
  • 1,771
  • 1
  • 15
  • 25

2 Answers2

1

These aren't lines as such in terms of BufferedReader.readLine(), and the bytes concerned are not characters in its terms either. You will have to use an InputStream and look for the terminators yoursef

user207421
  • 305,947
  • 44
  • 307
  • 483
0

Use the Scanner class. You can set the delimiter with useDelimiter(String) .

mrres1
  • 1,147
  • 6
  • 10