0

I have some samples data extracted from a PDF and I need to write a parser to extract the text and numbers in an array for further manipulation. I think I should use JFlex but have no idea how to start

The data looks like that

Manager Salary 615/12/4129  2,200.00  2,300.00  100.00  4.35  2,200.00
2,300.00  100.00  4.35  27,600.00
Maintenance Payroll 615/12/4139 1,107.99  1,100.00 -7.99 -0.73  1,107.99  1,100.00 -7.99 -0.73 13,200.00
Payroll Taxes 615/12/4149  689.27  685.00 -4.27 -0.62 689.27  685.00 -4.27 -0.62  4,550.00
Workmen's Comp Insur 615/12/4159  360.49  905.00  544.51  60.17  360.49  905.00  544.51 60.17  4,590.00
Health Insur / Benefits 615/12/4169  485.70  845.00  359.30 42.52  485.70  845.00  359.30  42.52  10,140.00

Sometime the token starting with 615/ can be attached to the descriptions. The idea would be to say. If a token is a number then array[1], array[2] ... depending of position. Anything else goes to array[0]

Any help appreciated. JFlex syntax is not easy to get started with

Thanks in advance

Seki
  • 11,135
  • 7
  • 46
  • 70
Pascal DeMilly
  • 681
  • 1
  • 6
  • 16
  • Do you need to repeat the text processing in the future? Maybe that a simple perl script would be easier to transform the data. – Seki Mar 31 '16 at 08:12
  • That's the route I went since I posted this question. I actually tokenized the line and reversed the order, as my lines always ends with a serie of numbers. Thx – Pascal DeMilly Apr 01 '16 at 19:15

0 Answers0