Set search pattern by setting a constraint on how a substring should not start and another on how a substring should not end

Question

import re, datetime

input_text = "Por las mañanas de verano voy a la playa, y en la manana del 22-12-22 16:22 pm o quizas mañana en la mañana hay que estar alli y no 2022-12-22 a la manana"

today = datetime.date.today()
tomorrow = str(today + datetime.timedelta(days = 1))

input_text = re.sub(r"\b(?:las|la)\b[\s|]*(?:mañana|manana)\bs\s*\b", tomorrow, input_text)

print(repr(input_text))  # --> output

Why does the restriction that I place fail?

The objective is that there cannot be any of these options (?:las|la) , the objective is that there cannot be any of these options in front of the pattern (?:mañana|manana) , and that there cannot be behind it either a letter 's' followed by one or more spaces s\s*

This is the correct output that you should get after making the replacements in the cases where it is appropriate

"Por las mañanas de verano voy a la playa, y en la manana del 22-12-22 16:22 pm o quizas 22-12-23 en la mañana hay que estar alli y no 2022-12-22 a la manana"

NB: It seems you keep bad habits that were already pointed out at your [previous questions](https://stackoverflow.com/questions/74802101/set-alphanumeric-regex-pattern-not-accepting-certain-specific-symbols): `[\s|]*` will allow a literal pipe symbol. You need `\s*`. This is not the answer to your question here, but just wanted to highlight this again. — trincot, Dec 22 '22 at 20:16
@trincot The thing about placing [\s|]* was done more for reasons of readability, since it assumes the presence of many or no spaces. — Matt095, Dec 22 '22 at 20:18
Zero or more - `*`, one or more - `+`. No need to complicate regexps. — Wiktor Stribiżew, Dec 22 '22 at 20:19
I think you want `re.sub(r"\b(las?\s+)?ma[ñn]ana\b", lambda x: x.group() if x.group(1) else tomorrow, input_text)`, see [this Python demo](https://ideone.com/hlO01S). — Wiktor Stribiżew, Dec 22 '22 at 20:20
@WiktorStribiżew Of course, in the case that there is nothing between the last letter 'a' and a letter 's', there should not be a pattern match. — Matt095, Dec 22 '22 at 20:20
There is no sense adding this restriction since `\b` disallows any letters after `a`. — Wiktor Stribiżew, Dec 22 '22 at 20:22
*"`[\s|]*` was done more for reasons of readability"*: It is readable, but it reads like "allow zero or more combinations of white spaces and pipe symbols". That you feel this conveys the message of "presence of many or no spaces" means that you misunderstand how character classes work. The most readable way to do this is `\s*`. — trincot, Dec 22 '22 at 20:22
When indicating the clause or restriction using `ma[ñn]ana\b` I would be blocking that after the letter a there cannot be any letter, however the restriction I am looking for is that it only be restricted if there is a letter `'s'` after `(?:mañana|manana)` — Matt095, Dec 22 '22 at 20:25
I still think `re.sub(r"\b(las?\s+)?ma[ñn]ana\b", lambda x: x.group() if x.group(1) else tomorrow, input_text)` will work. Please show when it fails. — Wiktor Stribiżew, Dec 22 '22 at 22:38
@WiktorStribiżew The pattern works practically perfect, the only little error that I found with this pattern `re.sub(r"\b(las?\s+)?ma[ñn]ana\b", lambda x: x.group() if x.group(1) else tomorrow, input_text)` is that this regex pattern wrongly excludes the sequence, for example `"mañanabsas"` or `"manana simples"`, and and should only exclude sequences where `"mañana"` or `"manana"` is immediately followed by a letter `'s'` like the cases `"mañanas "` or `"mananas"`. — Matt095, Dec 22 '22 at 23:58
I see, so you need `re.sub(r"\b(las?\s+)?ma[ñn]ana(?!s\b)", lambda x: x.group() if x.group(1) else tomorrow, input_text)` — Wiktor Stribiżew, Dec 23 '22 at 00:27

score 1 · Accepted Answer · answered Dec 23 '22 at 08:35

You can use

re.sub(r"\b(las?\s+)?ma[ñn]ana(?!s\b)", lambda x: x.group() if x.group(1) else tomorrow, input_text)

The regex matches

\b - a word boundary
(las?\s+)? - an optional la or las followed with one or more whitespaces
ma[ñn]ana - mañana or manana
(?!s\b) - a negative lookahead that fails the match if there is an s letter immediately at the end of the word.

If Group 1 matches, the replacement does not occur, if it does not match, the replacement is tomorrow.

Set search pattern by setting a constraint on how a substring should not start and another on how a substring should not end

1 Answers1