We are having a file, which is of the following type:
1- Sam, Joshua , "52 DD dr,
Lake Hiawatha" , New Jersey, 07034
2- Ruchi,kumari,SNN Raj serenity,Bengaluru, 560068
The line 1 is split into 2 rows in the External table with the rest of the columns being null in 1st row and 2nd row is having rest of the data.
Need assistance on what is the best way to load in a single column to overcome this issue. Went through a couple of solutions in the web , but was not clear.
Tried the following options:
1) Used the Regex Serde
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = '"*([^"]*)"*,"*([^"]*)"*'
)
but it did not work
2) CSVInputFormat from github https://github.com/mvallebr/CSVInputFormat
But not able to use it.