I use serde to read data with specific format with delimiter |
One line of my data may looks like: key1=value2|key2=value2|key3="va , lues", and I create the hive table as below:
CREATE EXTERNAL TABLE(
field1 STRING,
field2 STRING,
field3 STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^\\|]*)\\|([^\\|]*)\\|([^\\|]*)",
"output.format.string" = "%1$s %2$s %3$s"
)
STORED AS TEXTFILE;
I need to extract all values, ignore all quotas if they exist. Result looks like a
value2 value2 va , lues
How can I change my current regexp for extractig values ?