I have multiple 1.5 GB CSV Files which contain billing information on multiple accounts for clients from a service provider. I am trying to split the large CSV file into smaller chunks for processing and formatting the data inside it.
I do not want to roll out my own CSV parser but this is something I haven't seen yet so please correct me if I am wrong. The 1.5GB files contains information in the following order: account information, account number, Bill Date, transactions , Ex gst , Inc gst , type and other lines.
note that BillDate here means the date when the invoice was made, so occassionally we have more than two bill dates in the same CSV.
Bills are grouped by : Account Number > Bill Date > Transactions.
Some accounts have 10 lines of Transaction details, some have over 300,000 lines of Transaction details. A large 1.5GB CSV file contains around 8million lines of data (I used UltraEdit before) to cut paste into smaller chunks but this has become very inefficient and a time consuming process.
I just want to load the large CSV files in my WinForm, click a button, which will split this large files in chunks of say no greater than 250,000 lines but some bills are actually bigger than 250,000 lines in which case keep them in one piece and not split accounts across multiple files since they are ordered anyway. Also I do not wan't accounts with multiple bill date in CSV in which case the splitter can create another additional split.
I already have a WinForm application that does the formatting of the CSV in smaller files automatically in VS C# 2010.
Is it actually possible to process this very large CSV files? I have been trying to load the large files but MemoryOutOfException is an annoyance since it crashes everytime and I don't know how to fix it. I am open to suggestions.
Here is what I think I should be doing:
- Load the large CSV file (but fails since OutOfMemoryException). How to solve this?
- Group data by account name, bill date, and count the number of lines for each group.
- Then create an array of integers.
- Pass this array of integers to a file splitter process which will take these arrays and write the blocks of data.
Any suggestions will be greatly appreciated.
Thanks.