1

I'm looking for the REGEXP_SUBSTR code that gets dates like format '06-11-2014 - 05-12-2014' or format '01/11/2019 - 30/11/2019' from a string. The first date being the startdate and the second date being the enddate. It would be extremely helpful to understand how the REGEXP_SUBSTR works in this case and also why. I want to get the string with the two dates, but then I want both dates to be in their own column.

A record look likes this:

Medium - nl (06-11-2014 - 05-12-2014) ruimte: Standaard (5.000 MB).

Although text can be shorter or longer the two dates between brackets are always there.

The code below gets the first one, but only if it's with '-'. I want both '-' and '/' variants displayed.

REGEXP_SUBSTR(description, '[0-9][0-9][-[0-9][0-9]-[0-9][0-9][0-9][0-9]')

Thanks a lot for any and all help.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • The code `REGEXP_SUBSTR(ii.description, '[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9].-.[0-9][0-9]-[0-9][0-9]-[0-9][0-9][0-9][0-9]')` gets the whole string if it's with the '-' character. – vvanasperen Aug 21 '20 at 08:01

1 Answers1

0

Since you are using MySQL 8+, it means you also have access to the REGEXP_REPLACE function, which is suitable for isolating the portion of the string which contains the two dates. In the CTE below, I isolate the date string, then in a subquery on that CTE, I fish out the two dates in separate columns using SUBSTRING_INDEX.

WITH cte AS (
    SELECT
        text,
        REGEXP_REPLACE(text, '^.*\(([0-9]{2}-[0-9]{2}-[0-9]{4} - [0-9]{2}-[0-9]{2}-[0-9]{4})\).*$', '$1') AS dates
    FROM yourTable
)

SELECT
    text,
    SUBSTRING_INDEX(dates, ' - ', 1) AS first_date,
    SUBSTRING_INDEX(dates, ' - ', -1) AS second_date
FROM cte;

Demo

Here is an explanation of the regex pattern used:

^                                   from the start of the string
    .*                              match any content, until hitting
    \(                              '(' which is followed by
    (                               (capture what follows)
        [0-9]{2}-[0-9]{2}-[0-9]{4}  a single date
         -                          -
        [0-9]{2}-[0-9]{2}-[0-9]{4}  another single date
    )                               (stop capture)
    \)                              ')'
    .*                              match the remainder of the content
$                                   end of the string

Note that we include a pattern which matches the entire input, which is a requirement since we want to use a capture group. Also, note that REGEXP_SUBSTR might have been viable here, but it could run the risk that you get false positives, in the event that a date could appear elsewhere besides the terms in parentheses.

Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • Hi Tim, thansk for the answer. Could you explain regexp_replace code a bit? What does ^.*\ do? and \).*$', '$1'? I want to understand the logic behind it aswell. :-) – vvanasperen Aug 21 '20 at 08:15
  • 1
    Thanks so much for describing the logic. I think you've helped not only me, but others that come after me. – vvanasperen Aug 21 '20 at 08:53