1

I am writing a code to extract something useful from a very big Source.txt file. A sample of my source test file is as below:

Test case AAA
Current Parameters:
    Some unique param : 1
    Some unique param : 2
    Some unique param :     3
    Some unique param : 4
*A line of rubbish*
*Another line of rubbish*
*Yet another line of rubbish*
*More and more rubbish*
Test AAA PASS
Test case BBB
Current Parameters:
    Some unique param : A
    Some unique param : B
    Some unique param :     C
    Some unique param : D
*A line of rubbish*
*Another line of rubbish*
*Yet another line of rubbish*
*More and more rubbish*
Test BBB PASS

Now I am writing a code to extract only the Test case and Current Parameters:

processed = []

def main():
    source_file = open("Source.txt","r") #Open the raw trace file in read mode
    if source_file.mode == "r":
        contents = source_file.readlines()   #Read the contents of the file
        processed_contents = _process_content(contents)
        output_file = open("Output.txt","w")
        output_file.writelines(processed_contents)
        pass

def _process_content(contents):
    for raw_lines in contents:
        if "Test case" in raw_lines:
            processed.append(raw_lines)
        elif "Current Parameters" in raw_lines:
            processed.append(raw_lines)
            #I am stuck here
        elif "PASS" in raw_lines or "FAIL" in raw_lines:
            processed.append(raw_lines)
            processed.append("\n")
    return processed

#def _process_parameters():


if __name__ == '__main__':
    main()

After the line Current Parameters, I wanted to grab each of the Some unique param which will not be the same always and append to processed list so that it will be also noted in my Output.txt

My desired output is:

Test case AAA
Current Parameters:
    Some unique param : 1
    Some unique param : 2
    Some unique param :     3
    Some unique param : 4
    Test AAA PASS
Test case BBB
Current Parameters:
    Some unique param : A
    Some unique param : B
    Some unique param :     C
    Some unique param : D
    Test BBB PASS

If you see, I wanted to remove all the rubbish lines. Note that there are a lot of rubbish in my Source.txt. I am not sure how to go to the next raw_lines from there. Appreciate your help.

Hari
  • 718
  • 3
  • 9
  • 30
  • @Adam.Er8, it is good. But I don't know how to continue to the line Param 1 – Hari Jun 19 '19 at 07:08
  • why doesn't `elif "Param" in raw_lines` work? does it also capture some of the rubbish? – Adam.Er8 Jun 19 '19 at 07:09
  • Ok, I forgot to mention, the Param in raw_lines will always not always contain the word Param. Will edit the question now – Hari Jun 19 '19 at 07:11
  • OK, are these the only indented lines after `Current Parameters:`? because what we can do is mark a flag every time we encounter `Current Parameters:`, then also add all lines that are indented, and when we find one that isn't, take the flag down and go back to normal mode – Adam.Er8 Jun 19 '19 at 07:13
  • Just edited the question, my params will be unique and not with the same name. Sorry for the confusion – Hari Jun 19 '19 at 07:15

4 Answers4

1

It's hard to say for sure if this will work, because I don't know anything about the format of the rubbish lines, but I think you can just check to see if the line contains "Param", just like you're doing for the other lines:

def _process_content(contents):
    for raw_line in contents:
        if "Test case" in raw_line:
            processed.append(raw_line)
        elif "Current Parameters" in raw_line:
            processed.append(raw_line)
        elif "Param" in raw_line:
            processed.append(raw_line)
        elif "PASS" in raw_line or "FAIL" in raw_lines:
            processed.append(raw_line)
            processed.append("\n")
    return processed
mackorone
  • 1,056
  • 6
  • 15
  • If the param lines don't always contain "Param", and instead, always start with some whitespace, you can do `elif raw_line.startswith(" ")` – mackorone Jun 19 '19 at 07:12
  • I just edited my question, sorry for the confusion. – Hari Jun 19 '19 at 07:17
1

This is one approach using Regex.

Ex:

import re

result = []
with open(filename) as infile:
    for raw_lines in infile:
        if "Test case" in raw_lines:
            result.append(raw_lines)
        if "Current Parameters" in raw_lines:
            result.append(raw_lines)
            raw_lines = next(infile)                        #next() to move to next line. 
            while True:
                m = re.search(r"(?P<params>\s*\w+\s*:\s*\w+\s*)", raw_lines)    
                if not m:
                    break
                result.append(m.group("params"))
                raw_lines = next(infile)
        if "PASS" in raw_lines or "FAIL" in raw_lines:
            result.append(raw_lines)
            result.append("\n")
print(result)

Output:

['Test case AAA\n',
 'Current Parameters:\n',
 ' param : 1\n',
 ' param : 2\n',
 ' param :     3\n',
 ' param : 4\n',
 'Test AAA PASS\n',
 '\n',
 'Test case BBB\n',
 'Current Parameters:\n',
 ' param : A\n',
 ' param : B\n',
 ' param :     C\n',
 ' param : D\n',
 'Test BBB PASS',
 '\n']
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

You can use regex back reference (e.g. \2) to split the test cases (regex101):

import re

data = '''Test case AAA
Current Parameters:
    Some unique param : 1
    Some unique param : 2
    Some unique param :     3
    Some unique param : 4
*A line of rubbish*
*Another line of rubbish*
*Yet another line of rubbish*
*More and more rubbish*
Test AAA PASS
Test case BBB
Current Parameters:
    Some unique param : A
    Some unique param : B
    Some unique param :     C
    Some unique param : D
*A line of rubbish*
*Another line of rubbish*
*Yet another line of rubbish*
*More and more rubbish*
Test BBB PASS'''

for g in re.findall(r'(^Test case ([A-Za-z]+)\s+Current Parameters:(?:[^:]+:.*?$)*)+.*?(Test \2 (PASS|FAIL))', data, flags=re.DOTALL|re.M):
    print(g[0])
    print(g[2])

Prints:

Test case AAA
Current Parameters:
    Some unique param : 1
    Some unique param : 2
    Some unique param :     3
    Some unique param : 4
Test AAA PASS
Test case BBB
Current Parameters:
    Some unique param : A
    Some unique param : B
    Some unique param :     C
    Some unique param : D
Test BBB PASS
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

You can use str.startswith() to filter out the lines you want, then rewrite these lines to the file again. I would also split the line on ":", and check idd the length is 2 to find paramters. It would also be safe to convert the lines to all lowercase as well, so you can do caseless matching, so it doesn't think "Test" is not the same is "test".

Demo:

lines = []
with open("source.txt") as f:
    for line in f:
        lowercase = line.lower()
        if (
            lowercase.startswith("test")
            or lowercase.startswith("current parameters:")
            or len(lowercase.split(":")) == 2
        ):
            lines.append(line)

with open("source.txt", mode="w") as o:
    for line in lines:
        o.write(line)

source.txt:

Test case AAA
Current Parameters:
    Some unique param : 1
    Some unique param : 2
    Some unique param :     3
    Some unique param : 4
Test AAA PASS
Test case BBB
Current Parameters:
    Some unique param : A
    Some unique param : B
    Some unique param :     C
    Some unique param : D
Test BBB PASS
RoadRunner
  • 25,803
  • 6
  • 42
  • 75