0

I am trying to use the HIVE UDFs (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-DateFunctions) from Sparklyr to read-in properly some timestamps.

Unfortunately, I have not been able to parse correctly the following timestamp:

unix_timestamp('2011-03-01T00:00:04.226Z', 'yyyy-MM-ddThh:mm:ss.SSS' ) 

returns NAs..

Any ideas? What is the right pattern here? THanks!

ℕʘʘḆḽḘ
  • 18,566
  • 34
  • 128
  • 235

1 Answers1

3

You need to quote T and Z

hive> select unix_timestamp('2011-03-01T00:00:04.226Z', "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'" );
OK
1298959204

Or try this if you are not afraid to be clumsy:

select unix_timestamp(cast(regexp_replace('2011-03-01T00:00:04.226Z', '(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})Z', '$1-$2-$3 $4:$5:$6.$7' ) as timestamp))

To convert from EST to UTC, use the following:

hive> select to_utc_timestamp(unix_timestamp('2011-03-01T00:00:04.226Z', "yyyy-MM-dd'T'hh:mm:ss.SSS'Z'" )*1000, 'EST');
OK
2011-03-01 05:00:04

Multiplication with 1000 is required because, from Hive Language Manual:

Fractional values are considered as seconds. Integer values are considered as milliseconds.. E.g to_utc_timestamp(2592000.0,'PST'), to_utc_timestamp(2592000000,'PST') and to_utc_timestamp(timestamp '1970-01-30 16:00:00','PST') all return the timestamp 1970-01-31 00:00:00

philantrovert
  • 9,904
  • 3
  • 37
  • 61