Remove consecutively repeated substring in a string using regex

Asked Feb 19 '23 at 00:40

Active Feb 19 '23 at 00:40

Viewed 38 times

import re

input_text = "((PERS)Yo), ((PERS)Yo)      ((PERS)yo) hgasghasghsa ((PERS)Yo) ((PERS)Yo) ((PERS)Yo) ((PERS)yo) jhsjhsdhjsdsdh ((PERS)Yo) jhdjfjhdffdj ((PERS)ella) ((PERS)Ella) ((PERS)ellos) asassaasasasassaassaas ((PERS)yo) ssdsdsd"

pattern = re.compile(r'\(\(PERS\)\s*yo\s*\)(?:\(\(PERS\)\s*yo\s*\))+', flags = re.IGNORECASE)
modified_text = re.sub(pattern, '((PERS)yo)', input_text)

print(modified_text)

Why is this code not used to eliminate the repeated occurrences one after the other of the sequence of characters ((PERS)\s*yo\s*) ?

This should be the correct output:

"((PERS)Yo), ((PERS)yo) hgasghasghsa ((PERS)yo)  jhsjhsdhjsdsdh ((PERS)yo) jhdjfjhdffdj ((PERS)ella) ((PERS)Ella) ((PERS)ellos) asassaasasasassaassaas ((PERS)yo) ssdsdsd"

asked Feb 19 '23 at 00:40

Matt095

It does not match because of the spaces between ) and ( – Emanuel P Feb 19 '23 at 00:51
@EmanuelP What are you talking about? where is that error? – Matt095 Feb 19 '23 at 01:16
1

At the center of the pattern. The first part ends with ) and the second part starts with a (. In the input there are spaces between the repeated ((PERS)yo) things. So you need to add `\s*` in the middle of the pattern too. – Emanuel P Feb 19 '23 at 01:19
Try to modify this pattern, staying like this `pattern = re.compile(r'\(\(PERS\)\s*(?:[Yy][Oo])\s*\)\s*(?:\(\(PERS\)\s*[Yy][Oo]\s*\))+', flags = re.IGNORECASE)` , but it still doesn't remove all consecutive `((PERS)\s*yo\s*)` or `((PERS)\s*Yo\s*)` – Matt095 Feb 19 '23 at 01:24
1

Regexs are nasty things. I don't think this is really a place to troubleshoot individual cases, but here you go `pattern = re.compile(r'\(\(PERS\)\s*[Yy]o\s*\)(?:\s*\(\(PERS\)\s*[Yy]o\s*\))+')` – Emanuel P Feb 19 '23 at 01:39
1

See https://ideone.com/nK5RtD – Wiktor Stribiżew Feb 19 '23 at 09:48

Remove consecutively repeated substring in a string using regex

0 Answers0