0

I need to extract text which has two possible formats for the preceding text. To be used in Postgres function

In each of the two examples below the desired result is Request successful

[May 08, 12:06AM] Request successful
[Apr 18, 12:10AM] Request req_wofjfiufmjs: Request successful

so the prefix can be the \[.*\]\s or \[.*\]\sRequest\sreq_.*:\s

i have tried

(?<=\s*\[.*\]\s|\s*\[.*\]\s*Request\s*req_.*:\s).*

this works for the first case, but not the second case.

thehill
  • 93
  • 11
  • 1
    I think you can use an optional group `^\s*\[.*\]\s*(Request\s*req_.*:\s)?` https://regex101.com/r/k8bclI/1 Btw, I don't think POSIX supports an infinite quantifier in a lookbehind assertion like this `(?<=\s*` – The fourth bird May 08 '23 at 12:48
  • If you extract, you are not actually using a POSIX regex, it is an ARE regex, isn't it? `^\[.*\](?:\sRequest\sreq_.*:)?\s*(\S.*)` must work with `REGEX_MATCHES`. – Wiktor Stribiżew May 08 '23 at 12:50
  • I thought I was increasing detailby specifying POSIX. I have edited the question to clarify the usage case in Postgres. @WiktorStribiżew I was using substring but can see your answer requires regexp_matches. i am looking at the documentation to understand the differences. – thehill May 08 '23 at 15:29

1 Answers1

1

You get the right result for the first example string but not for the second example string because the assertion looking to the left is true earlier for this part \s*\[.*\]\s than for this part |\s*\[.*\]\s*Request\s*req_.*:\s

You could write it using regexp_match and a lookbehind assertion to get a match only, but in that case the pattern will not be very nice as it has to match the first part making sure there is no match for the second part to the right.

SELECT regexp_match(MyColumn, '(?<=\s*\[.*\]\s*Request\s*req_.*:\s|\s*\[.*\]\s(?!\s*.*req_.*:\s)).*', '') FROM MyTable;

Result

Request successful
Request successful

You could also replace the match with an empty string. Matching either \[.*\]\s or \[.*\]\sRequest\sreq_.*:\s can be done with an optional part without any lookarounds.

SELECT regexp_replace(MyColumn, '^\s*\[.*\]\s*(?:Request\s*req_.*:\s)?', '') FROM MyTable;

Result

Request successful
Request successful
The fourth bird
  • 154,723
  • 16
  • 55
  • 70