4

I'm trying to validate strings to timestamps from several CSVs and simply casting them to timestamptz will fail due to the impossibility of forcing an unique datetime format:

select '10/31/2010'::timestamp --fail due to "different datestyle" 
select '31/10/2010'::timestamp --works

I thought to_timestamp() would do the trick, but something like:

select to_timestamp('31/02/2014 14:30', 'DD/MM/YYYY HH24:MI'); 

will return "2014-03-03 14:30:00-05" instead of throwing an exception

So I thought of using this approach, that reverts back the output to text using to_char and comparing it with the original input, but a mask like 'DD/MM/YYYY HH24:MI' will cast "06/03/2014 0:06" to "06/03/2014 00:06", the strings are different!

CREATE OR REPLACE FUNCTION to_timestamp_safe(IN p_date text, IN p_format text, OUT r_date timestamp without time zone)
  RETURNS timestamp without time zone AS
$BODY$

    BEGIN
      r_date = TO_TIMESTAMP(p_date, p_format);
       IF TO_CHAR(r_date, p_format) != p_date THEN
        RAISE EXCEPTION 'Input date % does not match output date %', p_date, r_date;
      END IF;
    END;
    $BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

The following example fails where it should work:

select to_timestamp_safe ('06/03/2014 0:06', 'DD/MM/YYYY HH24:MI');

output:

ERROR: Input date 06/03/2014 0:06 does not match output date 2014-03-06 00:06:00
SQL state: P0001

Is there a smart way of safely validate strings to timestamptz without the above pitfalls?

Joe
  • 101
  • 1
  • 7
  • 2
    It would be useful to have a `to_timestamp` that doesn't normalize dates with days out of range, with a boolean flag for that. (It's long annoyed me that Pg fixes, rather than rejects, such dates). I don't see any easy way to do what you're describing as things stand. – Craig Ringer Jul 26 '14 at 01:37
  • 1
    Using the [`FM` modifier](http://www.postgresql.org/docs/current/interactive/functions-formatting.html#FUNCTIONS-FORMATTING-DATETIMEMOD-TABLE), your given example would work:`SELECT to_timestamp_safe ('06/03/2014 0:06', 'DD/MM/YYYY FMHH24:MI');` But that's far from a general solution, which I don't see either. – Erwin Brandstetter Jul 26 '14 at 03:11
  • Maybe a different language that has different date-handling characteristics would work. Python, TCL, Perl . . . – Mike Sherrill 'Cat Recall' Jul 26 '14 at 13:07
  • @ErwinBrandstetter Why is it not a general solution? – philipxy Jul 27 '14 at 00:55
  • @philipxy: This is just *one* of many input variants, we'd have to cover all or know what to deal with beforehand. – Erwin Brandstetter Jul 27 '14 at 00:59
  • What is the input format you want to allow? What about requiring successful parse(s) by `regexp_matches` followed by a string you build from extracted parts being successfully converted to a `timestamptz` and back without change? – philipxy Jul 27 '14 at 10:30
  • Thanks guys, I'll try to cook up my own concoction based on Erwin's and philipxy's ideas and I'll post my results here. – Joe Jul 28 '14 at 13:09
  • @philipxy: This is geological data that comes from many sources, flat files from machines and humans alike that can go from a few rows to millions of rows and I don't have control over the input. I grab them all with copy to a temp staging table (based on each file header), so the validations need to be fast and ideally, set-based, I also have to log where it failed and why. I could have some sort of mask template that is a regex. I'll experiment on this and post back... – Joe Jul 28 '14 at 13:18
  • @ErwinBrandstetter: I've edit my question with a possible solution for the problem using FM modifiers as you suggested, they will work as long as the column doesn't have mixed trailing zeroes. i.e.: _'06/03/2014 0:06'_ and _'06/03/2014 00:06_ and the exact right mask is applied. – Joe Jul 29 '14 at 17:21
  • @Joe: Nice. I suggest you put that into an actual *answer* (and remove it from the question. It is ok to answer you own question. – Erwin Brandstetter Jul 29 '14 at 17:38

1 Answers1

0

The FM modifier did the trick (thanks @ErwinBrandstetter) and it's indeed kind of a general solution because we came up with this idea of creating a profile for each type of csv and the date format mask can be stored there for the columns that are date/time, most part of the columns go to an hstore column anyway and we need to keep their types stored somewhere. So I was able to create something like this, where _colmask is the datetime format mask that may or may not have a FM modifier. Then I simply perform this function against my staging table (looping for each column)

CREATE OR REPLACE FUNCTION validate_column_datatype(_colvalue text, _colname text, _type text,  _colmask text)
  RETURNS void AS
$BODY$
BEGIN

declare
    -- error stack
    _returned_sqlstate text := null;
    _column_name text := null;
    _constraint_name text := null;
    _pg_datatype_name text := null; 
    _message_text text := null;
    _table_name text := null;
    _schema_name text := null;
    _pg_exception_detail text := null;
    _pg_exception_hint text := null;
    _pg_exception_context text := null;

    BEGIN


    IF _type = 'timestamptz'  then    
          IF TO_CHAR(TO_TIMESTAMP(_colvalue, _colmask), _colmask) != _colvalue THEN
        RAISE EXCEPTION 'Input date % does not match output date', _colvalue;
          END IF;

    ELSEIF _type = 'timestamp'  then
          IF TO_CHAR(TO_TIMESTAMP(_colvalue, _colmask), _colmask) != _colvalue THEN
        RAISE EXCEPTION 'Input date % does not match output date', _colvalue;
          END IF;

    ELSEIF _type = 'numeric'  then
        perform _colvalue::numeric;   

    ELSEIF _type = 'integer'  then
        perform _colvalue::integer;    

   -- other types

        END IF;   

-- exception occurs      
    EXCEPTION WHEN OTHERS THEN
         get stacked diagnostics 
        _returned_sqlstate      = RETURNED_SQLSTATE,
        _column_name        = COLUMN_NAME,
        _constraint_name    = CONSTRAINT_NAME, 
        _pg_datatype_name   = PG_DATATYPE_NAME, 
        _message_text       = MESSAGE_TEXT,
        _table_name         = TABLE_NAME,
        _schema_name        = SCHEMA_NAME,
        _pg_exception_detail    = PG_EXCEPTION_DETAIL,
        _pg_exception_hint      = PG_EXCEPTION_HINT,
        _pg_exception_context   = PG_EXCEPTION_CONTEXT;

        _message_text :=  -- write something meaningful
        _pg_exception_detail = -- write something meaningful
        _pg_exception_hint := -- write something meaningful

        -- log to something

    END;

END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100; 
Joe
  • 101
  • 1
  • 7