0

I want to use Parallel.ForEach to loop on all the lines of a large file (2GB file).

I am currently using foreach like that:

var lines = File.ReadLines(fileName);
foreach (var line in lines) {
  // Process line
}

Is it possible to convert it into Parallel.ForEach and by using File.ReadLines too because it doesn't consume more memory?

Any help would be appreciated.

Thanks!

Mario
  • 1,374
  • 6
  • 22
  • 48
  • What have you tried doing and how has it failed? – UnholySheep Aug 23 '18 at 11:55
  • @UnholySheep - I've tried to read the whole file into a List then use it in the `Parallel.ForEach` loop. But this is weird. – Mario Aug 23 '18 at 11:57
  • 1
    @MatrixCow08 Be sure to profile the operation afterwards. `Parallel.ForEach` sometimes makes things slower. You may want to share `// Process line` with us as well - in case you are doing something that isn't thread-safe. – mjwills Aug 23 '18 at 12:00

1 Answers1

2

How about

using (var linesDisposable = File.ReadLines(fileName) as IDisposable)
{
    var lines = linesDisposable as IEnumerable<string>;
    Parallel.ForEach<string>(lines,  line => {
        //Process(line);
    });
}

Notice that 'using + IDisposable' is there as a matter of best practice. The underlying line-stream implements IDisposable and it's prudent to put it to good use. If we omit 'using' then if/when the call to 'Process()' throws an exception the stream won't get disposed automatically which can cause all sorts of issues down the road because the file will be locked by underlying OS (so we won't be able to delete it etc).

Footnote: If you are using dotnet-core you may also want to simplify nesting a bit like so:

using var linesDisposable = File.ReadLines(fileName) as IDisposable;
var lines = linesDisposable as IEnumerable<string>;
Parallel.ForEach<string>(lines,  line => {
   //Process(line);
});
XDS
  • 3,786
  • 2
  • 36
  • 56
fubo
  • 44,811
  • 17
  • 103
  • 137