0

I have multiple regex strings in format:- Example:

A='AB.224-QW-2018'

B='AB.876-5-LS-2018'

C='AB.26-LS-18'

D='AB-123-6-LS-2017'

E='IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L'

F='ZX-ss-12L-AB-123-6-LS-2017-BC-22'

G='AB.224-2018'

H=''AB.224/QW/2018'

I=''AB/224/2018'

J='AB-10-HDB-231-NCLT-1-2017 AD-42-HH-2019'

K=''AB-1-HDB-NCLT-1-2016 AD-42-HH-2020'

L='AB-1-HDB-NCLT-1-2016/(AD-42-HH-2020)

I want a regex pattern to get the output for the numbers that occur after the alphabets(that appear at the start) as well as the first alphabets. And at last years that are mentioned at last. There are some strings which contain 876-5,123-6 in B and D respectively. I don't want the single number that appear after -.

My code :

re.search(r"\D*\d*\D*(AB)\D*(\d+)\D*(20)?(\d{2})\D*\d*\D*)

Another attempt


re.search(r"D*\d*\D*(AB)\D*(\d+)\D*\d?\D*(20)?(\d{2})D*\d*\D*)

Both attempts will not work for all of them. Any pattern to match all strings?

I have created groups in regex pattern and extracted them as d.group(1)+"/"+d.group(2)+"/"+d.group(4). So output is expected as following if a regex pattern matches for all of them.

Expected Output

A='AB/224/18'

B='AB/876/18'

C='AB/26/18'

D='AB/123/17'

E='AB/224/18'

F='AB/123/17'

G='AB/224/18'

H='AB/224/18'

I='AB/224/18'

J='AB/10/17'

K='AB/1/16'

L='AB/1/16'



edox741
  • 43
  • 6
  • This is now a completely different question to what you had originally asked. Please don't do this! We need to know exactly what you need, we cannot guess. What output do you expect from these completely different, new examples you have added? – terdon Jul 22 '22 at 10:56
  • Outputs expected were similar like previous ones. I have edited outputs for all of them. – edox741 Jul 22 '22 at 13:14

2 Answers2

1

You could use 3 capture groups:

\b(AB)\D*(\d+)\S*?(?:20)?(\d\d)\b
  • \b A word boundary to prevent a partial word match
  • (AB) Capture AB in group 1
  • \D* Match optional non digits
  • (\d+) Capture 1+ digits in group 2
  • \S*? Optionally match non whitespace characters, as least as possible
  • (?:20)? Optionally match 20
  • (\d\d) Capture 2 digits in group 3
  • \b A word boundary

Regex demo

For example using re.finditer which returns Match objects that each hold the group values.

Using enumerate you can loop the matches. Every item in the iteration returns a tuple, where the first value is the count (that you don't need here) and the second value contains the Match object.

import re

pattern = r"\b(AB)\D*(\d+)\S*?(?:20)?(\d\d)\b"

s = ("A='AB.224-QW-2018'\n"
            "B='AB.876-5-LS-2018'\n"
            "C='AB.26-LS-18'\n"
            "D='AB-123-6-LS-2017'\n"
            "IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L' F='ZX-ss-12L-AB-123-6-LS-2017-BC-22\n"
            "A='AB.224-QW-2018'\n"
            "B='AB.876-5-LS-2018'\n"
            "C='AB.26-LS-18'\n"
            "D='AB-123-6-LS-2017'\n"
            "E='IA-Mb-22L-AB.224-QW-2018-IA-Mb-22L'\n"
            "F='ZX-ss-12L-AB-123-6-LS-2017-BC-22'\n"
            "G='AB.224-2018'\n"
            "H='AB.224/QW/2018'\n"
            "I='AB/224/2018'")

matches = re.finditer(pattern, s)

for _, m in enumerate(matches, start=1):
    print(m.group(1) + "/" + m.group(2) + "/" + m.group(3))

Output

AB/224/18
AB/876/18
AB/26/18
AB/123/17
AB/224/18
AB/123/17
AB/224/18
AB/876/18
AB/26/18
AB/123/17
AB/224/18
AB/123/17
AB/224/18
AB/224/18
AB/224/18
The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/246745/discussion-between-pratham-bhatia-and-the-fourth-bird). – edox741 Jul 25 '22 at 17:20
0

Can't you just look for the last two digits, irrespective of dashes and "20" prefix? Like

(AB)[.-](\d+).*(\d\d)

I've tested in Sublime Text - works for me, it returns the same output you mentioned as desired.

Alex
  • 815
  • 9
  • 19
  • I can't use the last two digits because I have many strings these were just examples. In some of the strings there are further more characters and numbers after last two digits. – edox741 Jul 22 '22 at 10:29
  • Then please add more examples that help us understand the quirks that are there :) – Alex Jul 22 '22 at 10:30
  • Okay let me add more – edox741 Jul 22 '22 at 10:30
  • 1
    I have added more. Basically my string starts with AB and I want the following number and the year. I have added \D*\d* at first and last to avoid strings at first and last. – edox741 Jul 22 '22 at 10:36