I have folders where approx 3000 new csv files come in on a daily basis, each containing between 50 and 2000 lines of information.
Currently, there is a process in place which picks these files up one at a time and takes each line one at a time and sends it to a stored procedure to insert the contents into a database.
This means that over the course of a day, it can struggle to get through the 3000 files before the next 3000 come in!
I'm looking to improve this process and had the following ideas
- Use new Parallel feature of C# 4.0 to allow multiple files to be processed at once, still passing through the lines one by one to the stored proc
- Create a new temporary database table where all the rows in the file can be inserted into at once then call the stored procedure on the newly added rows in the temp table.
- Split the process into 2 tasks. One job to read data from the files into the temporary database table, the other to process the rows in the temporary table.
Any other ideas on how I could look at doing this? Currently it can take up to 20 seconds per file, I'd really like to improve performance on this considerably.