import re, datetime
#operation function
def add_or_subtract_days(days, operation):
today = datetime.date.today()
if operation == "add" : input_text = (datetime.datetime.strptime(today, "%Y-%m-%d") + datetime.timedelta(days=int(days))).strftime('%Y-%m-%d')
elif operation == "subtract" : input_text = (datetime.datetime.strptime(today, "%Y-%m-%d") - datetime.timedelta(days=int(days))).strftime('%Y-%m-%d')
return input_text
#recognition patterns function
def relative_indicated_dates(input_text):
#both initial after the last line break that they have within their capture range
# ((?:\w+)?) ---> with a capturing group this pattern can capture a substring of alphanumeric characters (uppercase and lowercase) until it is taken with a space, a comma or a dot
# ((?:\w\s*)+) ---> this pattern is similar to the previous one but it does not stop when finding spaces
some_text_withot_separations = "" # <--- HERE THE PROBLEM, WHERE I NEED THE RESTRICTION
date_format = r"\d*-\d{2}-\d{2}"
#date_format = r"\d*-\d{2}-\[\d{2}_--_\d{2}]"
input_text = re.sub(date_format + r"[\s|]*" + some_text_withot_separations + " (?:(?:pasados|despues[\s|]*de|después|despues|tras)[\s|]*(?:[\s|]*unos|)?[\s|]*(\d+)[\s|]*(?:días|dias|día|dia)|(\d+)[\s|]*(?:días|dias|día|dia)[\s|]*(?:despues|luego)) " + some_text_withot_separations + r"[\s|]*" + date_format,
#lambda m: print(m[1], m[2], m[3]) ,
lambda m: m[1] + add_or_subtract_days(m[2] or m[3], "add"),
input_text)
return input_text
#Input string examples:
input_text = "Empezo en 1999-12-30. Seguro eso ocurrio despues de 720 dias, o quizas fue aproximadamente despues de 6 dias o menos, o incluso puede ser el 2000-01-13 mismo"
input_text = "Seguro eso ocurrira despues de 2 dias, o quizas sea aproximadamente despues de 26 dias; actualmente luego de 2 dias ya tendre todo listo"
input_text = "Seguro eso ocurrira despues de 2 dias, o quizas sea aproximadamente despues del 2022-12-03; actualmente luego de 2 dias ya tendre todo listo"
input_text = """Decian muchas cosas; Seguro eso ocurrira despues de 120 dias, o quizas sea aproximadamente pasados 182 dias o quizas incluso menos.
Alla por 1996-11-02 o almenos 1800 dias despues inicio el modesto emprendimiento"""
print(repr(relative_indicated_dates(input_text))) # --> output
I need it to replace the original string with the output of the function add_or_subtract_days()
, as long as what is indicated in regex patterns is fulfilled. That is, it must replace with the output of that function, if and only if, there is no date r"\d*-\d{2}-\d{2}"
indicated before or after the phrase(alphanumeric character substring) that it must replace.
Regardless of this algorithm indicated here, the objective of the program is to indicate that if n number of days are indicated after a reference date NOT indicated, then those days begin to be counted taking the present day (today) as a reference date for the operation.
In the only case where it should enter the function even if there is a date before or after it is when there is a semicolon ";"
or a full stop point ".[\s|]*\n*"
I have already tried to place the pattern ((?:\w+)?)
or the pattern ((?:\w\s*)+)
in the some_text_withot_separations
variable, although these patterns have 2 problems, so they are not suitable patterns. The first problem they present is that neither of the two patterns allows listing which elements stop the capture (that is, I need it not to find ;
or \n.
). And the second problem they present is that both for some reason start their capture after the last newline of the substring that spans their capture range (which is incorrect).
After doing the replacement of the character strings (if any of the conditions are met), you should get these correct outputs for these input examples (assuming today
in my country is = '2022-12-12'
)
'Empezo en 1999-12-30. Seguro eso ocurrio despues de 720 dias, o quizas fue aproximadamente despues de 6 dias o menos, o incluso puede ser el 2000-01-13 mismo'
'Seguro eso ocurrira 2022-12-14, o quizas sea aproximadamente 2023-01-07; actualmente 2022-12-14 ya tendre todo listo'
'Seguro eso ocurrira despues de 2 dias, o quizas sea aproximadamente despues del 2022-12-03; actualmente 2022-12-14 ya tendre todo listo'
"""Decian muchas cosas; Seguro eso ocurrira 2023-04-11, o quizas sea aproximadamente 2023-06-12 o quizas incluso menos.
Alla por 1996-11-02 o almenos 1800 dias despues inicio el modesto emprendimiento"""
What regex pattern should I establish to achieve these replacements considering that it does not find the ; or these line breaks?