I'm trying to write a Hive script which creates two External tables, both of them pointing to the same file LOCATION with differents regular expressions (filters). When I try to make an UNION between them, results aren't as expected. The first chunk of code creates the tables
CREATE EXTERNAL TABLE logsFormat1(col1 INT, col2 STRING, col3 INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "Regex1",
"output.format.string" = "%1$s %2$s %3$s")
STORED AS TEXTFILE
LOCATION '/user/.../directoryFile';
CREATE EXTERNAL TABLE logsFormat2(col1 STRING, col2 INT, col3 INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "Regex2",
"output.format.string" = "%1$s %2$s %3$s")
STORED AS TEXTFILE
LOCATION '/user/.../directoryFile';
UNION statement just get results from last SELECT in a weird way.
`SELECT l1.url FROM logsFormat1 l1 where l1.url is not null
UNION ALL
SELECT l2.url FROM logsFormat2 l2 where l2.url is not null`
I discovered that this happen because both TABLES location are pointing to the same file. The problem is that I can't have two files. I need solve this with the same file location due to the real file is very long