1

In Flink 1.13, how do you configure a CREATE TABLE statement to use a postgres timestamp column to partition by?

Things I have tried:

In postgres, I have a column named 'my_timestamp' of type TIMESTAMP WITHOUT TIME ZONE

In my Flink CREATE TABLE statement, I'm specifying it as the partition column like so:

...
my_timestamp TIMESTAMP WITHOUT TIME ZONE
...
'scan.partition.column' = 'my_timestamp',
'scan.partition.num' = '4',
'scan.partition.lower-bound' = '" + lower_bound_bigint + "',
'scan.partition.upper-bound' = '" + upper_bound_bigint + "'

Where variables lower_bound_bigint and upper_bound_bigint are epoch seconds of java type long.

lower-bound and upper-bound don't like String representations of times, whereas they seem to accept longs.

However, this results in malformed queries being sent to postgres, producing the error:

org.postgresql.util.PSQLException: ERROR: operator does not exist: timestamp without time zone >= bigint

I thought Flink would take care of converting the long timestamps to postgres TIMESTAMP WITHOUT TIME ZONE in the generated WHERE clause, but apparently not.

The documentation says you can use timestamps for partitioning, but I'm not sure how to complete the pattern, nor how to intercept the longs in the query's generated WHERE clause to manually cast back to TIMESTAMP WITHOUT TIME ZONE, if that's what's needed.

https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/jdbc/#partitioned-scan

Jordan Morris
  • 2,101
  • 2
  • 24
  • 41
  • Could you add some examples of time strings which you have tried? – bzu Aug 13 '22 at 20:58
  • The only other thing I can think of is to declare a computed (bigint) column and try to partition on that. – bzu Aug 14 '22 at 08:40
  • @bzu I since realized that only .column supports timestamp. .lower-bound and .upper-bound only support ints, so putting time strings there is not going to work, no matter how they're formatted. There must be some internal, or hookable way to convert ints to timestamps in the where clause, but none I've been able to find. – Jordan Morris Aug 15 '22 at 04:29
  • @bzu using a computed column removes the benefit of postgres indexes and partitions, resulting in full table scans for each query – Jordan Morris Aug 15 '22 at 04:30
  • I see, then it seems that the docs are incorrect about supporting timestamps. Maybe you can open an issue on Flink Jira (at least for improvement of the docs). – bzu Aug 15 '22 at 08:18

0 Answers0