1

I am having an issue trying to convert multiple dates to one defined format. We are receiving the multiple dates from another DB source so I do not have control of the formatting until it reaches ours.

Here are all the formats:

YYYYMMDD

YYYY-MM-DD HH:MM:SS

MM/DD/YYYY

MM-DD-YYYY

Abrieviated Day Month DD HH:MM:SS TimeZone YYYY ('Thu Feb 02 20:49:59 MSK 2012')

Fully written Day, Month DD, YYYY HH:MM:SS AM/PM

My requirement is to set them all to the standard MM/DD/YYYY format or null. Any ideas?

Thank you.

ryan45366
  • 29
  • 2
  • 7
  • I understand that you are getting strings that represent dates in multiple formats. Is your goal to transform those strings into dates (which have no format)? Or to transform the input strings into an output string that also represents a date in a particular format? – Justin Cave Jul 28 '15 at 16:25
  • do you get multiple dates in a string? or multiple strings which have a date with different formats? – 1010 Jul 28 '15 at 16:27
  • To transform those input strings into an output string that represents a date in the MM/DD/YYYY format in a date column. The source column is a VARCHAR2(40) and the destination field is a date field. – ryan45366 Jul 28 '15 at 16:30
  • is there anything useful that tells you which format the row will be in, or will you have to guess? – Boneist Jul 28 '15 at 16:41
  • also, how are you expecting to handle the information with timezones? eg. what would you expect to be inserted for 'Thu Feb 02 20:49:59 MSK 2012'? just 2nd Feb 2012? or does it need to be converted to a specific timezone first (eg. UTC)? – Boneist Jul 28 '15 at 16:43
  • No there isn't, but I was thinking of trying to do this in a CASE statement and going by the length(install_date). Ex. CASE length(install_date) when 8 then to_date(install_date, 'MM/DD/YYYY'). I'm not sure how to handle the ones with timezones. I keep receiving various ORA errors on those. I dont need to convert it into a specific timezone just 02/02/2012 – ryan45366 Jul 28 '15 at 16:44
  • Possibly a duplicate of [this question](http://stackoverflow.com/q/4077959/266304); but you'd need to define the priority you use to attempt the conversions carefully. – Alex Poole Jul 28 '15 at 17:22
  • Thanks for the link; I'll check it out Alex. Sorry for the duplicate post. – ryan45366 Jul 28 '15 at 17:41

3 Answers3

3

You may define a transformation function, basically processing sequentially each format:

create or replace function translate_date(i_date_string VARCHAR2) return date as
begin
-- you may optimize to not to go in all blocks based on the string format
-- order the blocks on the expected frequency
 begin
   return to_date(i_date_string,'yyyymmdd');
 EXCEPTION
   WHEN OTHERS  THEN NULL;
 end;  
 begin
   return to_date(i_date_string,'yyyy/mm/dd');
 EXCEPTION
   WHEN OTHERS  THEN NULL;
 end; 
 begin
   return to_date(i_date_string,'yyyy-mm-dd');
 EXCEPTION
   WHEN OTHERS  THEN NULL;
 end; 
 begin
   return to_date(i_date_string,'yyyy-mm-dd hh24:mi:ss');
 EXCEPTION
   WHEN OTHERS  THEN NULL;
 end; 
 begin
 -- transform to local timestamp and than to date
   return cast(cast(to_timestamp_tz(i_date_string,'dy month dd hh24:mi:ss tzr yyyy') as TIMESTAMP WITH LOCAL TIME ZONE) as date);
 EXCEPTION
   WHEN OTHERS  THEN NULL;
 end; 
 begin
   return to_date(i_date_string,'dy, month dd, yyyy hh:mi:ss am');
 EXCEPTION
   WHEN OTHERS  THEN NULL;
 end;  
 return NULL;
end;
/

for example for sample data

 TSTMP                                                                                              
 ------------------------
 20150101                                                                                             
 2015-01-01 23:59:59                                                                                  
 2015/01/01                                                                                           
 2015-01-01                                                                                           
 Thu Feb 02 20:49:59 Europe/Moscow 2012                                                               
 Thu, Feb 02, 2012 10:49:59 AM                                                                        
 Thu, Feb 02, 2012 10:49:59 PM  

you get

TSTMP                                       RESULT_DATE       
------------------------------------------  -------------------
20150101                                    01.01.2015 00:00:00 
2015-01-01 23:59:59                         01.01.2015 23:59:59 
2015/01/01                                  01.01.2015 00:00:00 
2015-01-01                                  01.01.2015 00:00:00 
Thu Feb 02 20:49:59 Europe/Moscow 2012      02.02.2012 17:49:59 
Thu, Feb 02, 2012 10:49:59 AM               02.02.2012 10:49:59 
Thu, Feb 02, 2012 10:49:59 PM               02.02.2012 22:49:59 

Note that I skipped the case with time zone abbraviation (MSK), see possible solution in the answer from @Sentinel, but check Conversion of String with Abbreviated Timezone to Timestamp that this may be ambiguous.

Community
  • 1
  • 1
Marmite Bomber
  • 19,886
  • 4
  • 26
  • 53
  • Thanks for the help answer as well Marmite. I am trying to implement Sentinel's solution, but may find it more useful to add the transformation into a function. Should be able to select two responses as answers, but regardless, appreciate the help! – ryan45366 Jul 29 '15 at 15:58
  • @ryan45366 you are welcome! And no problem, I was inspired by @Sentinel s solution as well. But please check my comment on the conversion of teh abbreviated time zone. Taking `min(tzname)` could lead to other result as taking `max(tzname)` – Marmite Bomber Jul 29 '15 at 16:04
  • 1
    @ryan45366, I think the function is a better solution. More durable and easier to reuse in queries. – davejal Jul 08 '19 at 18:40
2

I'd suggest using a case statement with regexp_like conditions to detect likely formats and return dates using the appropriate date mask in the then clauses e.g.:

with tz as (
SELECT distinct tzabbrev
     , first_value(min(tzname)) over (partition by tzabbrev order by count(*) desc) tzname
  FROM v$timezone_names 
 group by tzabbrev
     , TZ_OFFSET(tzname)
), dta as (
select yt.install_date
     , regexp_replace(yt.install_date,tzabbrev,tzname,1,1,'i') install_date2
  from your_table yt
  left join tz
    on regexp_like(install_date, tz.TZABBREV,'i')
)
select install_date, install_date2
     , to_timestamp_tz( install_date2
              , case 
                  when regexp_like(install_date2,'^[A-Z]{3,} [A-Z]{3,} [0-9]{1,2} [0-9]{1,2}(:[0-9]{2}){1,2} [[:print:]]{5,} [0-9]{2,4}','i') then 'DY MON DD HH24:MI:SS TZR YYYY'
                  when regexp_like(install_date2,'^[A-Z]{4,},? [A-Z]{3,},? [0-9]{1,2},? [0-9]{2,4}','i') then 'DAY MONTH DD YYYY'
                  when regexp_like(install_date2,'^[A-Z]{3},? [A-Z]{3,},? [0-9]{1,2},? [0-9]{2,4}','i') then 'DY MONTH DD YYYY'
                  when regexp_like(install_date2,'^[0-9]{1,2}[-/][0-9]{1,2}[-/]([0-9]{2}){1,2}') then 'MM-DD-RRRR'
                  when regexp_like(install_date2,'^[0-9]{1,2}[-/ ][A-Z]{3,}[-/ ]([0-9]{2}){1,2}','i') then 'DD-MON-RRRR'
                  when regexp_like(install_date2,'^[A-Z]{3,}[-/ ][0-9]{1,2},?[-/ ]([0-9]{2}){1,2}','i') then 'MON-DD-RRRR'
                  when regexp_like(install_date2,'^(19|20)[0-9]{6}') then 'RRRRMMDD'
                  when regexp_like(install_date2,'^[23][0-9]{5}') then 'DDMMRR'
                  when regexp_like(install_date2,'^[0-9]{6}') then 'MMDDRR'
                  when regexp_like(install_date2,'^[01][0-9]{7}') then 'MMDDRRRR'
                  when regexp_like(install_date2,'^[23][0-9]{7}') then 'DDMMRRRR'
                  ELSE NULL
                end
              ||case
                  when regexp_like(install_date2, '[0-9]{1,2}(:[0-9]{2}){1,2}$') then ' HH24:MI:SS'
                  when regexp_like(install_date2, '[0-9]{1,2}(:[0-9]{2}){1,2} ?(am|pm)$','i') then ' HH:MI:SS AM'
                  else null
                end
              )
              Install_Time_Stamp
  from dta;

I had issues with the time zone abbreviations so I added a step to replace them with time zone regions first.

Sentinel
  • 6,379
  • 1
  • 18
  • 23
  • Once you've got the time stamp, you can change the time zone, truncate off the time component, format it as a string etc. – Sentinel Jul 28 '15 at 19:14
  • Thank you so much Sentinel for this reply! This does exactly what I need it to do. I just have to add a few more regexp_like when cases because I am discovering there are values of [[todaysDate]] and 'now' populated in the install_date column..haha – ryan45366 Jul 29 '15 at 15:56
  • I've updated the above query with an improved time zone region lookup. Now it grabs the first region name for the most common timezone offset associated with a given time zone abbreviation. So for example it not returns `'Europe/Chisinau'` instead of `'Europe/Moscow'` for `'MSK'`. `'MSK'` occurs 11 times for offset +03:00 but only twice for offset +04:00. It's still just a best guess, but it avoids the `ORA-01857: not a valid time zone` and/or `ORA-01882: timezone region not found` errors I was encountering without the conversion. This is thanks to Marmite Bomber's comments on his answer. – Sentinel Jul 29 '15 at 20:25
0

The function can be simplified a little bit as to_date tolerates minor deviations (different separators, missing separators, missing time components, 2/4-digit year). For instance to_date(:str,'rrrr-mm-dd hh24:mi:ss') will cover

2020/09/18 01.02
2020.09.18 01
20200918010203
2020-0901
202009-01
20/09/18

To_timestamp_tz will also tolerate missing milliseconds and time zone (including missing TZ elements TZM and TZD). So we basically need to take care of major variations only like hh24/hh, mm/mon, elements order, "T" separator in ISO 8601 (2020-01-01T01:02:03), TZ designators (UTC offset TZH:TZM / region name TZR), and day of week (DY).