0

I have a Hive table which contains a timestamp field and it can have any timezone ..(UTC/PST/CST....) I want to convert all of them to a single timestamp, EST. it can be done either in Hive or Pyspark. Basically, i am using it in my pyspark application which has a grouping logic on this datetime field and before doing that we want to have all the times in Hive table to be converted to EST time.

Sid

Sidd
  • 261
  • 1
  • 6
  • 24

2 Answers2

1

Mention to the facts which HIV Timezone have limitation on maximum time associates to Y2K38 bugs and JDBC compatibility issue,

TIMESTAMP type to serde2 that supports unix timestamp (1970-01-01 00:00:01 UTC to 2038-01-19 03:14:07 UTC) with optional nanosecond precision using both LazyBinary and LazySimple SerDes. For LazySimpleSerDe, the data is stored in jdbc compliant java.sql.Timestamp parsable strings. HIV-2272

Here is simulation associates to supporting timestamps earlier than 1970 and later than 2038.

Hive JDBC doesn't support TIMESTAMP column

Therefore, I think will be better if you are using HIV DataType of Date Type or String Type. Then you can use any timezone offset as the default on persistent.

* utc_timestamp is the column name */


/* bellow will convert a timestamp in UTC to EST timezone  */

select from_utc_timestamp(utc_timestamp, 'EST') from table1;

Hope this helps.

HIV Data Types

OO7
  • 660
  • 4
  • 10
0

Sidd, usually Hive uses the local timezone of the host where the data was written. The function from_utc_timestamp() and to_utc_timestamp can we very helpful. Instead of stating the timezone as UTC/EST you should rather use location/region in that case, since this will account for the day light savings.

Here's a helpful link for more examples: Local Time Convert To UTC Time In Hive

In case you have further questions, please share what have you already tried and share a sample snippet of your data for investigating further.