I would probably write things like this:
def show_time_of_pid(line):
pattern = r"^(\w{3}) \s (\d+) \s ([\d:]+) \s .[^[]+\[(\d+)]:.*"
result = re.search(pattern, line, flags=re.VERBOSE)
return result.groups()
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
Using re.VERBOSE
lets us split things up to be a little easier to read. Here we have several distinct match groups:
(\w{3})
matches the month name
(\d+)
matches the day of the month
([\d:]+)
matches the time
[^[]+\[(\d+)]
matches the PID ("a bunch of characters that are not [
followed by [
, then a string of digits, then ]
)
Each group is separated by whitespace (\s
).
Running the above code produces:
('Jul', '6', '14:01:23', '29440')
You could get fancier with an outer capture group; by writing:
import re
def show_time_of_pid(line):
pattern = r"^((\w{3}) \s (\d+) \s ([\d:]+)) \s .[^[]+\[(\d+)]:.*"
result = re.search(pattern, line, flags=re.VERBOSE)
return result.groups()
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
We get the entire date string in the first capture group:
('Jul 6 14:01:23', 'Jul', '6', '14:01:23', '29440')
And of course we can get back a labeled dictionary instead of just a list by using named capture groups:
import re
def show_time_of_pid(line):
pattern = r"^(?P<timestamp>(?P<month>\w{3}) \s (?P<day>\d+) \s ([\d:]+)) \s .[^[]+\[(?P<pid>\d+)]:.*"
result = re.search(pattern, line, flags=re.VERBOSE)
return result.groupdict()
print(show_time_of_pid("Jul 6 14:01:23 computer.name CRON[29440]: USER (good_user)"))
Which produces:
{'timestamp': 'Jul 6 14:01:23', 'month': 'Jul', 'day': '6', 'pid': '29440'}