I have a file that looks like this:
33.49.147.163 20140416123526 https://news.google.com/topstories?hl=en-US&gl=US&ceid=US:en 29 409 Firefox/5.0
I want to load it into a hive table. I do it this way:
create external table Logs (
ip string,
ts timestamp,
request string,
page_size smallint,
status_code smallint,
info string
)
row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe'
with serdeproperties (
"timestamp.formats" = "yyyyMMddHHmmss",
"input.regex" = '^(\\S*)\\t{3}(\\d{14})\\t(\\S*)\\t(\\S*)\\t(\\S*)\\t(\\S*).*$'
)
stored as textfile
location '/data/user_logs/user_logs_M';
And
select * from Logs limit 10;
results in
33.49.147.16 NULL https://news.google.com/topstories?hl=en-US&gl=US&ceid=US:en 29 409 Firefox/5.0
How to parse timestamps correctly, to avoid this NULLs?