I'm trying to perform a sed
/awk
style regex substitution with python3
's re
module.
You can see it works fine here with a hardcoded test string:
#!/usr/bin/env python3
import re
regex = r"^(?P<time>\d+\:\d+\:\d+\.\d+)(?:\s+)(?P<func>\S+)(?:\s+)(?P<path>\S+(?: +\S+)*?)(?:\s+)(?P<span>\d+\.\d+)(?:\s+)(?P<npid>(?P<name>\S+(?: +\S+)*?)\.(?P<pid>\d+))\n"
subst = "'\\g<name>', "
line = ("21:21:54.165651 stat64 this/ 0.000012 THiNG1.12471\n"
"21:21:54.165652 stat64 /that 0.000012 2thIng.12472\n"
"21:21:54.165653 stat64 /and/the other thing.xml 0.000012 With S paces.12473\n"
"21:21:54.165654 stat64 /and/the_other_thing.xml 0.000012 without_em_.12474\n"
"21:59:57.774616 fstat64 F=90 0.000002 tmux.4129\n")
result = re.sub(regex, subst, line, 0, re.MULTILINE)
if result:
print(result)
But I'm having some trouble getting it to work the same way with the stdin:
#!/usr/bin/env python3
import sys, re
regex = r"^(?P<time>\d+\:\d+\:\d+\.\d+)(?:\s+)(?P<func>\S+)(?:\s+)(?P<path>\S+(?: +\S+)*?)(?:\s+)(?P<span>\d+\.\d+)(?:\s+)(?P<npid>(?P<name>\S+(?: +\S+)*?)\.(?P<pid>\d+))\n"
subst = "'\\g<name>', "
for line in str(sys.stdin):
#sys.stdout.write(line)
result = re.sub(regex, subst, line, 0, re.MULTILINE)
if result:
print(result,end='')
I'd like to be able to pipe input straight into it from another utility, like is common with grep
and similar CLI utilities.
Any idea what the issue is here?
Addendum
I tried to keep the question simple and generalized in the hope that answers might be more useful in similar but different situations, and useful to more people. However, the details might shed some more light on the problem, so here I will include are the exact details of my current scenario:
The desired input to my script is actually the output stream from a utility called fs_usage
, it's similar to utilities like ps
, but provides a constant stream of system calls and filesystem operations. It tells you which files are being read from, written to, etc. in real time.
From the manual:
NAME
fs_usage
-- report system calls and page faults related to filesystem activity in real-timeDESCRIPTION
The fs_usage utility presents an ongoing display of system call usage information pertaining to filesystem activity. It requires root privileges due to the kernel tracing facility it uses to operate.By default, the activity monitored includes all system processes except for:
fs_usage
,Terminal.app
,telnetd
,telnet
,sshd
,rlogind
,tcsh
,csh
,sh
,zsh
. These defaults can be overridden such that output is limited to include or exclude (-e
) a list of processes specified by the user.The output presented by
fs_usage
is formatted according to the size of your window.
A narrow window will display fewer columns. Use a wide window for maximum data display.
You may override the formatting restrictions by forcing a wide display with the-w
option.
In this case, the data displayed will wrap when the window is not wide enough.
I hack together a crude little bash script to rip the process names from the stream, and dump them to a temporary log file. You can think of it as a filter or an extractor. Here it is as a function that will dump straight to stdout (remove the comment on the last line to dump to file).
proc_enum ()
{
while true; do
sudo fs_usage -w -e 'grep' 'awk' |
grep -E -o '(?:\d\.\d{6})\s{3}\S+\.\d+' |
awk '{print $2}' |
awk -F '.' '{print $1}' \
#>/tmp/proc_names.logx
done
}