-3

I have a log file as the following. Each line logs some string and the thread id. Each thread belongs to a process and a process can have N threads.

Based on the following sample, I want to extract (using bash tools, grep, sed and whatever) all the lines of all threads that belongs to a given process. Note that the process is mentioned only once, at the top of a thread sequence:

line1 thread= 150 process= 200
line2 thread= 152 whatever
line3 thread= 150 whatever
line4 thread= 150 whatever
line5 thread= 130 whatever
line6 thread= 130 process= 200
line7 thread= 150 process= 201
line8 thread= 130 whatever
line9 thread= 130 whatever

For this sample, give process 200 the output should be:

line1 thread= 150 process= 200
line3 thread= 150 whatever
line4 thread= 150 whatever
line6 thread= 130 process= 200
line8 thread= 130 whatever
line9 thread= 130 whatever

1 Answers1

0

awk solution:

filter_threads.awk script:

#!/bin/awk -f
function get_thread(s){           # extracts thread number from the string
    t = substr(s,index(s,"=")+1); # considering `=` as separator (e.g. `thread=150`) 
    return t; 
} 
BEGIN { 
    pat = "process="p   # regex pattern to match the line with specified process
}
$3~pat {    # on encountering "process" line
    thread = get_thread($2); print; next   # getting base thread number 
}
{ 
    t = get_thread($2); 
    if (t==thread) print  # comparing current thread numbers with base thread number
}

Usage:

awk -f filter_threads.awk -v p=200 yourfile

- where p is process number

The output:

line1 thread=150 process=200
line3 thread=150 whatever
line4 thread=150 whatever
line6 thread=130 process=200
line8 thread=130 whatever
line9 thread=130 whatever

Update:

As you have changed you initial input the new solution would be as below:

awk -v p=200 '$4~/process=/ && $5==p{ thread=$3; print; next }$3==thread{ print }' yourfile
RomanPerekhrest
  • 88,541
  • 4
  • 65
  • 105
  • Thank you. I mark this as correct answer, however I have a space after process= (edited in original post) which cannot be handled as-is in awk. If you can update, that will be most helpful. – Tsiftis Karampouzouklis Jun 14 '17 at 13:42