2

I am trying to use regex to do the following in a string :

  • If there is a hyphen - between two alphabets, we have to remove it:
    • Example A-BA should be ABA; and A-B-BAB should be ABBAB
  • If an alphabet and a number are next to each other, then we have to insert a hyphen - symbol between them:
    • Example 9AHYA7 should be 9-AHYA-7; and 977AB99T5 should be 977-AB-99-T-5

These patterns are just simple examples. The string could be more complicated like this :

  • HS98743YVJUHGF78BF8HH3JHFC83438VUN5498FCNG
  • 7267-VHSBVH8737HHC8C-HYHFWYFHH-7Y84743YR8437G

In the above strings the same principles have to be incorporated.

I tried the following code to convert 8T into 8-T

    re.sub(r'\dab-d', '\d-ab-d', s)

Unfortunately it does not work. I am not sure how to do it.

bad_coder
  • 11,289
  • 20
  • 44
  • 72
Arun Kumar
  • 634
  • 2
  • 12
  • 26

2 Answers2

1

If you want to use re.sub, then here is one way, using capture groups:

inp = "8T-ENI-A2"
output = re.sub(r'^(.)(.)-([^-]+)-(.)(.)$', '\\1-\\2\\3\\4-\\5', inp)
print(output)

This prints:

8-TENIA-2
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
1

You might use 2 capturing groups with lookarounds and in the replacement use a lambda to check which group matched.

If group 1 matched, remove the last character. If group 2 matched, append a hyphen.

([A-Z]-(?=[A-Z]))|([A-Z](?=[0-9])|[0-9](?=[A-Z]))

Explanation

  • ( Capture group 1
    • [A-Z]-(?=[A-Z]) Match A-Z and - and assert what is on the right is A-Z
  • ) Close group
  • | Or
  • ( Capture group 2
    • [A-Z](?=[0-9]) Match A-Z and assert what is on the right is a digit
    • | Or
    • [0-9](?=[A-Z]) Match 0-9 and assert what is on the right is A-Z
  • ) Close group

Regex demo | Python demo

Example code

import re

pattern = r"([A-Z]-(?=[A-Z]))|([A-Z](?=[0-9])|[0-9](?=[A-Z]))"
strings = [
    "A-BA",
    "A-B-BAB",
    "9AHYA7",
    "977AB99T5",
    "HS98743YVJUHGF78BF8HH3JHFC83438VUN5498FCNG",
    "7267-VHSBVH8737HHC8C-HYHFWYFHH-7Y84743YR8437G"
]

for str in strings:
    result = re.sub(
        pattern,
        lambda x: x.group(1)[:-1] if x.group(1) else x.group(2) + "-",
        str
    )
    print(result)

Output

ABA
ABBAB
9-AHYA-7
977-AB-99-T-5
HS-98743-YVJUHGF-78-BF-8-HH-3-JHFC-83438-VUN-5498-FCNG
7267-VHSBVH-8737-HHC-8-CHYHFWYFHH-7-Y-84743-YR-8437-G
The fourth bird
  • 154,723
  • 16
  • 55
  • 70