2

I am attempting to figure out in python.. what is the native hive timestamp format that it can consume as a string in parquet..

I think python is giving me a good close way BUT.. i noticed my code is yielding a good date in python..

def dt2epoch(value):
    d = parse(value)
    d = d.replace(microsecond=0)
    timestamp = d.replace(tzinfo=datetime.timezone.utc).timestamp()
    new_timestamp = round(int(timestamp), -3)

    new_date = datetime.datetime.fromtimestamp(new_timestamp)

    return new_timestamp

enter image description here

But when I load this in HIVE as a table

CREATE TABLE IF NOT EXISTS hive.DBNAME.TABLE_NAME (
           ->                  COL1           VARCHAR,
           ->                  COL2           VARCHAR,
           ->                  COL3           VARCHAR,
           ->                  COL4           BIGINT,
           ->                  COL5           VARCHAR,
           ->                  COL6           VARCHAR,
           ->                  timestamped    TIMESTAMP)
           ->                WITH (
           ->                  external_location = 's3a://MYBUCKET/dir1/dir2/',
           ->                  format = 'PARQUET');

it comes out like it's the 70's

enter image description here

Erik
  • 2,782
  • 3
  • 34
  • 64

1 Answers1

2

I think it is dividing your timestamp with 1000, which is already in seconds. If you convert 1663529 (seconds), to timestamp, it will give you a result in 1970s. I don't use Hive but maybe you can multiply the input by 1000 or find out how if it accepts any parameters that allow you define in the code, whether the input is in seconds or milliseconds.

  • 1
    multiplying the input seemed to move the time but.. I don't understand why HIVE is doing this or.. if there's a more correct way to ensure I get the time into the table on first shot correct.. other than multiplying by 1000000 or whatever the number is.. seems strange – Erik Sep 30 '22 at 16:32
  • 1
    I can confirm you were somewhat correct.. I am multiplying the LONG by `1000000` and it seems to land where I expected.. but the code is ugly.. I am attempting to decipher what's going on .. but it appears in the error `Caused by: java.lang.IllegalArgumentException: Expected 0s for digits beyond precision 3: epochMicros = 1663522214967696` seems I need to keep it in microseconds but with lower precision – Erik Sep 30 '22 at 17:07
  • @Erik, I don't use Hive. But maybe [this](https://stackoverflow.com/questions/27942930/hive-from-unixtime-for-milliseconds) helps: – Khushi Mehta Oct 02 '22 at 06:13