-1

i have following sample text and need to pass all lines of text to tuple/list based on a word "ALL Banks Report".. the raw text as follows

%Bank PARSED MESSAGE FILE
%VERSION   : PIL 98.7
%nex MODULE   : SIL 98

2018 Jan 31  16:44:53.050 ALL Banks Report SBI
name id ID = 0,  ID = 58
    Freq = 180

    conserved NEXT:
      message c1 : ABC1 : 
          {
            XYZ2
           }
2018 Jan 31  16:44:43.050 ALL Banks Report HDFC
conserved LATE:

World ::= 
{
  Asia c1 : EastAsia : 
      {
        India
       }
}

...like so many repitions i want to pass tuple/List/array based on a word "ALL Banks Report" so that in list[0] the following goes

2018 Jan 31  16:44:53.050 ALL Banks Report SBI
name id ID = 0,  ID = 58
    Freq = 180

    conserved NEXT:
      message c1 : ABC1 : 
          {
            XYZ2
           }

and in list[1] the rest goes like below

2018 Jan 31  16:44:43.050 ALL Banks Report HDFC
conserved LATE:

World ::= 
{
  Asia c1 : EastAsia : 
      {
        India
       }
}
Python Spark
  • 303
  • 2
  • 6
  • 16
  • 2
    so what have you tried so far? – Azat Ibrakov Feb 11 '18 at 05:43
  • i have done as by implementing logic that first finding out the line number of the string matches to "ALL Banks report" and then passing all the lines between then numbered line in to list. but i am checking out whether is there any direct method – Python Spark Feb 11 '18 at 06:02

2 Answers2

1

IMO, there's no particular advantage to the use of pyparsing here. It's easy to process this file using an old-fashioned algorithm.

output_list = []
items = []
with open('spark.txt') as spark:
    for line in spark:
        line = line.rstrip()
        if line and not line.startswith('%'):
            if 'ALL Banks Report' in line:
                if items:
                    output_list.extend(items)
                items = [line]
            else:
                items.append(line)
if items:
    output_list.extend(items)

for item in output_list:
    print (item)

Output:

2018 Jan 31  16:44:53.050 ALL Banks Report SBI
name id ID = 0,  ID = 58
    Freq = 180
    conserved NEXT:
      message c1 : ABC1 :
          {
            XYZ2
           }
2018 Jan 31  16:44:43.050 ALL Banks Report HDFC
conserved LATE:
World ::=
{
  Asia c1 : EastAsia :
      {
        India
       }
}

Incidentally, I have avoided the use of list as an identifier, since it's a Python keyword.

Bill Bell
  • 21,021
  • 5
  • 43
  • 58
1

I am a huge fan of itertools.groupby, and here is an unconventional way to use it to find your groups of bank lines:

from itertools import groupby

is_header = lambda s: "ALL Banks Report" in s

lines = sample.splitlines()

# call groupby to group lines by whether or not the line is a header or not
group_iter = groupby(lines, key=is_header)

# skip over leading group of non-header lines if the first line is not a header
if not is_header(lines[0]):
    next(group_iter)

groups = []
while True:
    head_lines = next(group_iter, None)

    # no more lines? we're done
    if head_lines is None:
        break

    # extract header lines, which is required before trying to advance the groupby iter
    head_lines = list(head_lines[1])

    # if there were multiple header lines in a row, with no bodies, create group items for them
    while len(head_lines) > 1:
        groups.append([head_lines.pop(0)])

    # get next set of lines which are NOT header lines
    body_lines = next(group_iter, (None, []))

    # extract body lines, which is required before trying to advance the groupby iter
    body_lines = list(body_lines[1])

    # we've found a head line and a body, save it as a single list
    groups.append(head_lines + body_lines)

# what did we get?
for group in groups:
    print('--------------')
    print('\n'.join(group))
    print('')

With your data set gives:

--------------
2018 Jan 31  16:44:53.050 ALL Banks Report SBI
name id ID = 0,  ID = 58
    Freq = 180

    conserved NEXT:
      message c1 : ABC1 : 
          {
            XYZ2
           }

--------------
2018 Jan 31  16:44:43.050 ALL Banks Report HDFC
conserved LATE:

World ::= 
{
  Asia c1 : EastAsia : 
      {
        India
       }
}
PaulMcG
  • 62,419
  • 16
  • 94
  • 130