Python re.sub always returns the original string value and ignores given pattern

Question

My code below

old = """
B07K6VMVL5
B071XQ6H38
B0B7F6Q9BH
B082KTHRBT
B0B78CWZ91
B09T8TJ65B
B09K55Z433
"""
duplicate = """
B0B78CWZ91
B09T8TJ65B
B09K55Z433
"""
final = re.sub(r"\b{}\b".format(duplicate),"",old)
print(final)

The final always prints the old variable values.I want the duplicate values to be removed in the old variable

First of all, why not `old.replace(duplicate,'')`? Next, you need to `strip` the `duplicate` - `re.sub(r"\b{}\b".format(duplicate.strip()),"",old)`, or at least `rstrip` it as there is no word boundary between a newline and end of string. — Wiktor Stribiżew, Oct 23 '22 at 14:14
To further spell out what @Wiktor is saying, the final `\b` does not match because there is no word boundary after the final newline. — tripleee, Oct 23 '22 at 14:18
I have formatted the code as follows. `final = re.sub(r"{}".format(duplicate),"",old) print(final)` . Got the same old variable value. `old.replace(duplicate,'')` also prints old value only — Aravindh Arun, Oct 23 '22 at 14:36
And now, do you have any issues? What are you actually trying to achieve? — Wiktor Stribiżew, Oct 23 '22 at 14:37
Actually I need to get duplicate variable as an input from an user and checks in with old variable (which is a already stored data) to remove the duplicates. — Aravindh Arun, Oct 23 '22 at 14:40

cards · Answer 1 · 2022-10-23T14:26:00.373

The block string should not start/end in a new line since it will introduce a \n character. Try with

old = """B07K6VMVL5
B071XQ6H38
B0B7F6Q9BH
B082KTHRBT
B0B78CWZ91 #    <-
B09T8TJ65B #    <-
B09K55Z433""" # <-

duplicate = """B0B78CWZ91
B09T8TJ65B
B09K55Z433"""

and the result will not equal to the old.

Output

B07K6VMVL5
B071XQ6H38
B0B7F6Q9BH
B082KTHRBT

Alternatively use the block string like this

"""\
B0B78CWZ91
B09T8TJ65B
B09K55Z433\
"""

score 0 · Answer 2 · answered Oct 23 '22 at 17:02

It seems you can use

final = re.sub(r"(?!\B\w){}(?<!\w\B)".format(re.escape(duplicate.strip())),"",old)

Note several things here:

duplicate.strip() - the whitespaces on both ends may prevent from matching, so strip() removes them from the duplicates
re.escape(...) - if there are special chars they are properly escaped with re.escape
(?!\B\w) and (?<!\w\B) are dynamic adaptive word boundaries. They provide proper matching at word boundaries if required.

Python re.sub always returns the original string value and ignores given pattern

2 Answers2