1
  (
      ResponseRgBasketId          STRING,
      RawStandardisedLoadDateTime TIMESTAMP,
      InfoMartLoadDateTime        TIMESTAMP,
      Operaame               STRING,
      RequestTimestamp            TIMESTAMP,
      RequestSiteId               STRING,
      RequestSalePointId          STRING,
      RequestdTypeId       STRING,
      RequeetValue          DECIMAL(10,2),
      ResponsegTimestamp TIMESTAMP,
      RequessageId            STRING,
      RequestBasketId             STRING,
      ResponsesageId           STRING,
      RequestTransmitAttempt      INT,
      ResponseCode                STRING,
      RequestasketItems    INT,
      ResponseFinancialTimestamp  TIMESTAMP,
      RequeketJsonString     STRING,
      LoyaltyId                   STRING
  )
  USING DELTA
  PARTITIONED BY (RequestTimestamp)
  TBLPROPERTIES
  (
      delta.deletedFileRetentionDuration = "interval 1 seconds",
      delta.autoOptimize.optimizeWrite = true
  )

It has been partitioned by RequestTimestamp(2020-12-12T07:39:35.000+0000 ), but it has the format as below. Could I change the format to different format to something like 2020-12-34 in partition by?

enter image description here

leftjoin
  • 36,950
  • 8
  • 57
  • 116
  • Probably not possible. Alternately, if you pass `substring(RequestTimestamp,1,10)` to the last column into the table, i think it should be fine and use `2020-12-34` to create partition. – Koushik Roy Mar 23 '21 at 16:01
  • @leftjoin how can that partition be dynamic and based on a formula? Do you have any idea? – Koushik Roy Mar 23 '21 at 19:02
  • @KoushikRoy Sure. the only way is to recreate table and reload using substring(RequestTimestamp,1,10). BTW what is a strange date 2020-12-34? I guess this is very important message. It really was 2020-12-34 but I slept it over it because 2020-12-31 celebrated NewYear too much – leftjoin Mar 23 '21 at 19:43
  • LOL, hey, its `2020`, anything can happen. Jokes aside, i understand your point of reloading. I was thinking if this is a table thats loaded from some transactional source/sensor directly. And if we can create a partition name on the fly using some formula. – Koushik Roy Mar 24 '21 at 04:25
  • Answered with example. – leftjoin Mar 24 '21 at 11:28

1 Answers1

1

Short answer: No regexp or other transformation is possible in PARTITIONED BY. The only solution is to apply substr(timestamp, 1, 10) during/before load. See also this answer: https://stackoverflow.com/a/64171676/2700344

Long answer:

No regexp is possible in PARTITIONED BY. No functions are allowed in table DDL, only type can be specified. Type in column specification works as constraint and at the same time can cause implicit type conversion. For example if you are loading strings into dates, it will be casted implicitly if possible and loaded into null default partition if not possible to cast. Also if you are loading BIGINT, it will be silently truncated to INT, as a result you will see corrupted data and duplicates.

Does the same implicit cast work with partitioned by? Let,s see:

DROP TABLE IF EXISTS test_partition;
CREATE TABLE IF NOT EXISTS test_partition (Id   int)
    partitioned by (dt date) --Hope timestamp will be truncated to DATE
;

set hive.exec.dynamic.partition=true;
set hive.exec.dynamic.partition.mode=nonstrict;

insert overwrite table test_partition partition(dt)
select 1 as id, current_timestamp as dt;

show partitions test_partition;

Result (We expect timestamp truncated to DATE...):

dt=2021-03-24 10%3A19%3A19.985

No, it does not work. Tested the same with varchar(10) column with strings like yours. See short answer.

leftjoin
  • 36,950
  • 8
  • 57
  • 116