10

I have Hive table created from JSON file.

CREATE external TABLE logan_test.t1 (
   name string,
   start_time timestamp
   )
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
WITH SERDEPROPERTIES (
  "timestamp.formats" = "yyyy-MM-dd'T'HH:mm:ss.SSSSSS"
)
LOCATION 's3://t1/';

My timestamp data is in the format of yyyy-MM-dd'T'HH:mm:ss.SSSSSS.

I specified SERDEPROPERTIES for timestamp format as given in the page. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types#LanguageManualTypes-TimestampstimestampTimestamps

Create statement executed successfully But select * failed with following error.

HIVE_BAD_DATA: Error parsing field value '2017-06-01T17:51:15.180400' for field 1: Timestamp format must be yyyy-mm-dd hh:mm:ss[.fffffffff]

leftjoin
  • 36,950
  • 8
  • 57
  • 116
logan
  • 7,946
  • 36
  • 114
  • 185

1 Answers1

2

Jira HIVE-9298 in which timestamp.formats was introduced, says in the description that it is for LazySimpleSerDe. I did not find any other mention in the documentation that it was done for other SerDe.

The solution is to define timestamp as STRING and transform in the select.

Example for yyyy-MM-dd'T'HH:mm:ss.SSSSSS format:

select timestamp(regexp_replace(start_time, '^(.+?)T(.+?)','$1 $2'))

And this will work both for yyyy-MM-dd'T'HH:mm:ss.SSSSSS and yyyy-MM-dd HH:mm:ss.SSSSSS (normal timestamp) if there are both formats in data files.

timestamp(regexp_replace(start_time, '^(.+?)[T ](.+?)','$1 $2'))

Regex is powerful and you can parse different string formats using the same pattern.

leftjoin
  • 36,950
  • 8
  • 57
  • 116