-1

I have a log file containing lines about different users, and I'm tailing this file in real time. I want to filter out the lines that are only related to a user that I specify, ex: 1234. The log entries look like this:

ID:101 Username=1234
ID:102 Username=1234
ID:999 UNWANTED LINE (because this ID was not assigned to user 1234)
ID:102 some log entry regarding the same user
ID:123 UNWANTED LINE (because this ID was not assigned to user 1234)
ID:102 some other text
ID:103 Username=1234
ID:103 blablabla

A dynamic ID is assigned to a user in a line like "ID:101 Username=1234". Any subsequent lines that start with that ID pertain to the same user and will need to be displayed. I need a dynamic tail that would get all IDs related to the specified user (1234) and filter the previous lines as follows:

ID:101 Username=1234
ID:102 Username=1234
ID:102 some log entry regarding the same user
ID:102 some other text
ID:103 Username=1234
ID:103 blablabla

I need to first filter the lines where "Username=1234" is found, then extract the "ID:???" from that line, then tail all lines that contain "ID:???". When another line with "Username=1234" is found, extract the new ID and use it to display the subsequent lines with this new ID.

I am able to chain greps to filter out the ID when I use cat, but it doesn't work when I chain them after a tail. But even if I could, how do I "watch" for a new value of ID and dynamically update my grep pattern???

Thanks in advance!

A.H.
  • 1
  • 4

1 Answers1

2

This is a task that Awk can handle with ease (and it could be handled with Perl or Python too).

awk '$2 == "Username=1234" { ids[$1]++; } $1 in ids  { print }' data

The first pattern/action pair records the ID:xxx value for an entry where $2 is Username=1234 in the array ids. The second pattern/action pair looks whether the ID:xxx entry is listed in ids; if so, it prints the line. The Username=1234 lines satisfy both criteria (at least, after the entry is added to the array).

How do I use it so it can act like tail (i.e. print the new lines as they're added to data)?

tail -f logfile | awk …

You'd miss the name of the data file from the awk part of the command, of course. The only thing you'd have to watch for is that tail doesn't hang-up waiting to fill the pipe buffer. It probably won't be a problem, but you might have to look hard at the options to tail if it takes longer for lines to appear in the Awk input than you expected.

I realized that ID:XXX doesn't necessarily always come at position $1... is there a way to match the ID with a regular expression regardless of its position in the line ($1, $2, ...)?

Yes:

awk '$2 == "Username=1234" { ids[$1]++; }
     { for (i = 1; i <= NF; i++) if ($i in ids) { print; break }' data

The second line matches every line, and for each field in the line, checks whether that field is present in ids array. If it is, it prints the line and breaks out of the loop (you could use next instead of break in this context, though the two are not equivalent in general).

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • Thank you so much! It seems to be exactly what I'm looking for!! But how do I use it so it can act like tail (i.e. print the new lines as they're added to data?). – A.H. Mar 07 '17 at 06:10
  • Use `tail -f logfile | awk …` to continue reading indefinitely. – Jonathan Leffler Mar 07 '17 at 06:26
  • Man you're awesome!! That's exactly it!! just one last thing... I realized that ID:XXX doesn't necessarily always come at position $1... is there a way to match the ID with a regular expression regardless of its position in the line ($1, $2, ...)? – A.H. Mar 07 '17 at 06:33
  • I recently encountered a `programthatneverexits | awk` command line, and the main problem is: if `awk` does buffered reads from pipes, you might not see output all that promptly. This doesn't seem to be an issue for `gawk`, but for `mawk`, which is the default awk on some distros, it helps to add the `-W interactive` option, so that it uses unbuffered reads – Mark Plotnick Mar 07 '17 at 14:52