0

I am trying to convert a date that's stored as a string to a date, e.g.

YYYYMMDD (string) to YYYY-MM-DD (date)

As far as I know there is no conversion function that checks input format and output format, I tried manual logic, e.g.

CASE 
  WHEN CHAR_LENGTH(TRIM(some_string_date)) = 8
  THEN
    CAST(
      SUBSTRING(TRIM(some_string_date) FROM 1 FOR 4)
      || '-'
      || SUBSTRING(TRIM(some_string_date) FROM 5 FOR 2)
      ||'-'
      || SUBSTRING(TRIM(some_string_date) FROM 7 FOR 2)
    as DATE) 
  ELSE
    NULL 
END

However this is not accepted by Apache SQL Validator, does anyone see problem here?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
Agni
  • 19
  • 1
  • 4

2 Answers2

2

Not directly answering the question, but maybe related, date literals are declared with DATE keyword, e.g. you can see examples in the tests in Beam tests: one, two and in Calcite docs.

Update:

What seems to happen is Calcite adds some indirection when doing CASE. Casting the strings to dates works as expected in general. For example, if input rows have schema (INT f_int, VARCHAR f_string) and dates are in 'YYYYMMDD' (e.g. (1, '2018'), then this works:

SELECT f_int,
   CAST(
    SUBSTRING(TRIM(f_string) FROM 1 FOR 4)
      ||'-'
      ||SUBSTRING(TRIM(f_string) FROM 5 FOR 2)
      ||'-'
      ||SUBSTRING(TRIM(f_string) FROM 7 FOR 2) as DATE)
  FROM PCOLLECTION

Even directly casting the 'YYYYMMDD' works:

SELECT f_int,
   CAST(f_string AS DATE)
FROM PCOLLECTION

You can see all supported date formats here.

But as soon as you wrap it in 'CASE ... ELSE NULL', then Beam/Calcite seem to infer that the expression type is now a 'String'. This means that 'THEN CAST(... AS DATE)' succeeds and returns a 'Date', but then it's converted to 'String' when wrapped in 'CASE'. Then, when returning the result in my test it seems to try to cast it back to 'Date', but the string format now is not 'YYYYMMDD' but some other default format. Unfortunately that format is not in the list of supported, so it fails.

Workaround:

As soon as you change 'ELSE NULL' to something that's known to be a 'Date', e.g. 'ELSE DATE "2001-01-01"' then it works again as Beam/Calcite don't seem to go though 'String'->'Date'->'String'->'Date' path and this works:

SELECT f_int,
  CASE WHEN CHAR_LENGTH(TRIM(f_string)) = 8
    THEN CAST (
       SUBSTRING(TRIM(f_string) FROM 1 FOR 4)
       ||'-'
       ||SUBSTRING(TRIM(f_string) FROM 5 FOR 2)
       ||'-'
       ||SUBSTRING(TRIM(f_string) FROM 7 FOR 2) AS DATE)
    ELSE DATE '2001-01-01'
  END
FROM PCOLLECTION

I filed BEAM-5789 to track a better solution.

Update 2:

So, while Calcite generates the plan telling Beam what to do, it's Beam that actually casts/parses the dates in this case. There's an effort to use Calcite's built-in implementations of basic operations instead of re-implementing everything in Beam: https://github.com/apache/beam/pull/6417 . After this pull request is merged, this CASE ... ELSE NULL path should work automatically, if I'm reading it right (I assume this class will be used for handling date/time values). It will still go through strings, probably unnecessarily, but it should just work.

Anton
  • 2,431
  • 10
  • 20
  • Thanks for reference! The issue here was not the syntax itself, but `TRIM` function. Apache Calcite is case sensitive so `trim` and `TRIM` are different. – Agni Oct 23 '18 at 17:21
  • I am not sure I understand. Do you mean that by using `trim` instead of `TRIM` everything works for you? – Anton Oct 24 '18 at 19:03
  • Yes :) Calcite doc says keywords are case sensitive, however doesn't say which of them, so far I am aware of that one only. – Agni Oct 25 '18 at 15:11
  • Oh, nice find. Looks like there are two issues here then :) – Anton Oct 25 '18 at 15:13
-2

If is MYSQL. You may try

SELECT DATE_FORMAT(STR_TO_DATE('20080908', '%Y%m%d'), "%Y-%m-%d");

For validate, you may check whether the string could be converted to date successfully. sometimes. NULL means failed.

wa56
  • 331
  • 2
  • 13