So I'm currently trying to research the best approach for dealing with processing a large file in c#. We currently have a large file with 10 million + lines of data. Originally, my client said the file would contain tens of thousands of lines so we previously wrote each line to a new file and had it picked up by our interface engine for processing. Now however, we're seeing these files come in much larger then expected and processing takes a weekend. I'm trying to optimize our logic and am researching the best way to go about it. I looked into trying to have multiple threads reading from a single file but the mechanical bottleneck of disk I/O doesn't provide much room for improvement there. The next method would be to read each line and process each line (or group of lines) on a separate thread. This will give us some optimization since the processing of each line can be done concurrently. I know some people have extensive experience in dealing with processing very large files and was hoping to get some feedback on my approach or maybe get some alternative ways to tackle this issue.
Any thoughts and comments are appreciated.