I am analyzing some log data from a process and have various columns such as id, date,time,log code, log text. id is unique for a product date and time are the time components when the log was captured. log code is the code specific to the log text log text is some 256 character text that describes the process
e.g.
ID Date time log id log text
A 01/10/18 9:00:00 bbb process begin
A 01/10/18 9:00:00 yyy dimensions not specified
A 01/10/18 9:00:30 fff failure
A 01/10/18 9:00:30 ddd dispatched
A 01/10/18 9:00:30 sss process success
B 01/10/18 9:01:01 bbb process begin
B 01/10/18 9:01:50 mmm moved to stage2
B 01/10/18 9:02:50 aaa space not allocated
B 01/10/18 9:02:50 fff failure
I want to grep(or rather create a subset) of the above dataset in a csv or xls output which meets the below conditions(can be changed) for example-
- 2 rows above the line where log text = failed
- all rows where log id was sss
so my expected output is -
ID Date time log id log text
A 01/10/18 9:00:00 bbb process begin
A 01/10/18 9:00:00 yyy dimensions not specified
A 01/10/18 9:00:30 fff failure
B 01/10/18 9:01:50 mmm moved to stage2
B 01/10/18 9:02:50 aaa space not allocated
B 01/10/18 9:02:50 fff failure
A 01/10/18 9:00:30 sss process success
using the discussion in the thread below: Grep for a word, and if found print 10 lines before and 10 lines after the pattern match
I tried some piece of code to get the below piece- import subprocess
filename = "filename.csv"
string_to_search = "failure"
extract = (subprocess.getstatusoutput("grep -C 2 '%s' %s"%(string_to_search, filename)))[1]
print(extract)