I am looking into various options to parse data from text files. We receive invoices from different clients and the format is not predefined. Basically we receive table kind of structure with different columns as shown below and data needs to be extracted from the file.
Right now, we are having an IExtractor interface with Parse method which is implemented by each client parser and depending upon the file appropriate class is instantiated and logic is hard coded to retrieve the data.
Since the number of clients are increasing, we are looking into more robust and easy to code method to extract the information from text files.
Is it recommended to use regular expressions for identifying header and footer and use another expression to extract the information from each row. I would appreciate if anyone could suggest better alternatives.
<addition text>.....
Date Document Invoice Deductions Paid Amount
--------------------------------------------------------------------------------------------
21.03.2014 9289 9280 0.00 48,000.00
10.01.2013 21389 9402 3.00 4,000.00
21.03.2014 9289 9280 0.00 48,000.00
10.01.2013 21389 9402 3.00 4,000.00
Sum Total
Please ....<text>