-1

I just want to segment this text file into lines and to classify the lines. If the line starts with "Qty" then the next lines are the order items until the line starts with "GST".

If the line starts with "Total Amount" Then this is the total amount line.

Business me . ' l
Address "rwqagePnnter Pro DemcRafifilp
Address "mfgr Eva|uat|on Only
Contact line 1
Transaction Number 10006
Issue Date 27/02/201
Time 10:36:55
Salesperson orsa orsa
Qty Description Unit Price Total
1 test $120.00 $120.00
GST $10.91
Total Amount $120.00
Cash $120.00
Please contact us for more information about
this receipt.
Thank you for your business.
d
.
test

Please show me how to do with PegJS http://pegjs.majda.cz/

ebohlman
  • 14,795
  • 5
  • 33
  • 35
Phil
  • 46,436
  • 33
  • 110
  • 175
  • A grammar for a mess like that seems like it'd be pretty difficult, but simply checking line-by-line with simple regular expressions seems pretty easy. – Pointy Mar 01 '13 at 15:02
  • I only want the lines in an array of text and each line is grouped by its classification. There are only 2 classifications. – Phil Mar 01 '13 at 15:04

3 Answers3

6

Here's a quick and dirty sample solution

{
  var in_quantity = false // Track whether or not we are in a quantity block
  var quantity    = []
  var gst         = null
  var total       = null
}

start =
  // look for a quantity, then GST, then a total and finally anything else
  (quantity / gst / total / line)+
  {
    return {quantity: quantity, gst: gst, total: total}
  }

chr = [^\n]
eol = "\n"?

quantity   = "Qty" chr+ eol        { in_quantity = true; }
gst        = "GST" g:chr+ eol      { in_quantity = false; gst = g.join('').trim(); }
total      = "Total Amount" t:line { in_quantity = false; total = t.trim(); }

line =
  a:chr+ eol
  {
    if( in_quantity ){
      // break quantities into columns based on tabs
      quantity.push( a.join('').split(/[\t]/) );
    }
    return a.join('');
  }
Zak
  • 1,042
  • 6
  • 12
  • I'm not sure which one is better, for this one I can read it and understand it more easily, that is my only basis since they both work. Thanks! – Phil Mar 06 '13 at 04:57
3

How about the following code as another solution.

{
  var result = [];
}

start
  = (!QTY AnyLine /
      set:(Quantities TotalAmount)
        {result.push({orders:set[0], total:set[1]})}
    )+ (Chr+)?
  {return result;}

QTY = "Qty"
GST = "GST"

Quantities
  = QtyLine order:(OrderLine*) GSTLine {return order;}

QtyLine
  = QTY Chr* _

OrderLine
  = !GST ch:(Chr+) _ {return ch.join('');}

GSTLine
  = GST Chr* _

TotalAmount
  = "Total Amount" total:(Chr*) _ {return total.join('');}

AnyLine
  = ch:(Chr*) _ {return ch.join('');}

Chr
  = [^\n]
_
  = "\n"
0

You could use XML, or you could do every line ending with a "/" and then splitting it by them using the split function.

mytext = mytext.split("/");

And then work with that. I don't know why you wouldn't just use sql or something similar.

Zachrip
  • 3,242
  • 1
  • 17
  • 32