0

I'm trying to write a Hive script which creates two External tables, both of them pointing to the same file LOCATION with differents regular expressions (filters). When I try to make an UNION between them, results aren't as expected. The first chunk of code creates the tables

CREATE EXTERNAL TABLE logsFormat1(col1 INT, col2 STRING, col3 INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "Regex1",
"output.format.string" = "%1$s %2$s %3$s")
STORED AS TEXTFILE
LOCATION '/user/.../directoryFile';

CREATE EXTERNAL TABLE logsFormat2(col1 STRING, col2 INT, col3 INT)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES ("input.regex" = "Regex2",
"output.format.string" = "%1$s %2$s %3$s")
STORED AS TEXTFILE
LOCATION '/user/.../directoryFile';

UNION statement just get results from last SELECT in a weird way.

`SELECT l1.url FROM logsFormat1 l1 where l1.url is not null
 UNION ALL
 SELECT l2.url FROM logsFormat2 l2 where l2.url is not null`

I discovered that this happen because both TABLES location are pointing to the same file. The problem is that I can't have two files. I need solve this with the same file location due to the real file is very long

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
marcos
  • 21
  • 3
  • Uh . . . Why not have a single table with all the log records and then use views to select subsets of it in Hive? It seems like a maintenance nightmare to me to have two tables pointing to the same physical files. – Gordon Linoff Dec 26 '15 at 16:03
  • Because I can't find a correct regular expression which grouped all the different kind of logs formats – marcos Dec 26 '15 at 16:11

1 Answers1

1

Finally I solved my problem creating a TEMPORARY TABLE

CREATE TEMPORARY TABLE TEMPlogsFormat1 
STORED AS TEXTFILE
AS SELECT * FROM logsFormat1 l1
marcos
  • 21
  • 3