Not directly answering the question, but maybe related, date literals are declared with DATE
keyword, e.g. you can see examples in the tests in Beam tests: one, two and in Calcite docs.
Update:
What seems to happen is Calcite adds some indirection when doing CASE
. Casting the strings to dates works as expected in general. For example, if input rows have schema (INT f_int, VARCHAR f_string)
and dates are in 'YYYYMMDD'
(e.g. (1, '2018')
, then this works:
SELECT f_int,
CAST(
SUBSTRING(TRIM(f_string) FROM 1 FOR 4)
||'-'
||SUBSTRING(TRIM(f_string) FROM 5 FOR 2)
||'-'
||SUBSTRING(TRIM(f_string) FROM 7 FOR 2) as DATE)
FROM PCOLLECTION
Even directly casting the 'YYYYMMDD'
works:
SELECT f_int,
CAST(f_string AS DATE)
FROM PCOLLECTION
You can see all supported date formats here.
But as soon as you wrap it in 'CASE ... ELSE NULL'
, then Beam/Calcite seem to infer that the expression type is now a 'String'
. This means that 'THEN CAST(... AS DATE)'
succeeds and returns a 'Date', but then it's converted to 'String'
when wrapped in 'CASE'
. Then, when returning the result in my test it seems to try to cast it back to 'Date'
, but the string format now is not 'YYYYMMDD'
but some other default format. Unfortunately that format is not in the list of supported, so it fails.
Workaround:
As soon as you change 'ELSE NULL'
to something that's known to be a 'Date'
, e.g. 'ELSE DATE "2001-01-01"'
then it works again as Beam/Calcite don't seem to go though 'String'->'Date'->'String'->'Date'
path and this works:
SELECT f_int,
CASE WHEN CHAR_LENGTH(TRIM(f_string)) = 8
THEN CAST (
SUBSTRING(TRIM(f_string) FROM 1 FOR 4)
||'-'
||SUBSTRING(TRIM(f_string) FROM 5 FOR 2)
||'-'
||SUBSTRING(TRIM(f_string) FROM 7 FOR 2) AS DATE)
ELSE DATE '2001-01-01'
END
FROM PCOLLECTION
I filed BEAM-5789 to track a better solution.
Update 2:
So, while Calcite generates the plan telling Beam what to do, it's Beam that actually casts/parses the dates in this case. There's an effort to use Calcite's built-in implementations of basic operations instead of re-implementing everything in Beam:
https://github.com/apache/beam/pull/6417 . After this pull request is merged, this CASE ... ELSE NULL
path should work automatically, if I'm reading it right (I assume this class will be used for handling date/time values). It will still go through strings, probably unnecessarily, but it should just work.