4

Quick-read question: I was wondering if there'd be other techniques that I overlooked, maybe p/invoke to a certain library (be it winapi or third party library). All advice is welcome.

The full context of the question: For a given usecase I need to read textfiles into memory which I afterwards can manipulate. The problem lies not in the manipulation though but in the I/O. I'm currently using following techniques withing C#:

1) ReadAllText() method of "File"

var content = File.ReadAllText(file.FullName);

2) ReadToEnd() method of "StreamReader"

var content = String.Empty;
using(var streamReader = File.OpenText(file.FullName)) {
    content = streamReader.ReadToEnd();
}

3) I also tried using a BufferedStream in conjunction with method 2

All had roughly the same performance for files between 5 to 20MB. So, here comes the question then: I was wondering if there'd be other techniques that I overlooked, maybe p/invoke to a certain library (be it winapi or third party library). All advice is welcome.

Yves Schelpe
  • 3,343
  • 4
  • 36
  • 69
  • Isn't that fast enough? The files are not that large, so i wonder why you bother using winapi if the current approach is readable and efficient. If they were really large you could use a [`MemoryMappedFile`](http://msdn.microsoft.com/en-us/library/system.io.memorymappedfiles.memorymappedfile%28v=vs.100%29.aspx). – Tim Schmelter Dec 11 '14 at 08:00
  • @TimSchmelter while I understand your remark, it's irrevelant to the initial question asked, e.g. maybe I don't need it, but others may. But I understand your intent, ofcourse winapi or thrid party calls can be messy, but that's something to be taken into account when implementing, or not.. For my case specifically it does matter indeed, otherwise I wouldn't have resorted to asking. If no solution is present though, then so be it I'll accept it :). – Yves Schelpe Dec 11 '14 at 08:05
  • Only other way is to either: 1) find better I/O rates in your hardware (replace your hardware to something better) or 2) find a better I/O driver for the device. There is nothing else that can really be done. – Ahmed ilyas Dec 11 '14 at 08:08
  • TimSchmelter I'll look into MemoryMappedFile (msdn). @Ahmedilyas - that's a valid point indeed, we might test with that as well. – Yves Schelpe Dec 11 '14 at 08:11
  • @Tim A mapping would not be any faster at getting the entire file into managed memory. – David Heffernan Dec 11 '14 at 08:17
  • Only possible thing I can think of is using `ReadAllBytes` instead and operating only that (or small decoded chunks). `ReadAllText` has to go through the all the bytes and decode them into strings. – Mike Zboray Dec 11 '14 at 08:17
  • Profiling both ways down to WinAPI calls would be interesting. I am also interested whether memory mapping would be faster or not. Since you are then operating entirely on a much slower HDD/SSD instead of ram you might find out that mapping is quick, but the rest is not :/ – Samuel Dec 11 '14 at 08:17
  • @DavidHeffernan: a `MemoryMappedFile` [has its advantages](http://stackoverflow.com/questions/192527/what-are-the-advantages-of-memory-mapped-files), it allows random access and you don't need to read the entire file into view. – Tim Schmelter Dec 11 '14 at 08:19
  • @YvesSchelpe: (acc. to your first comment) maybe you have to use a database instead of text files. A text-file is the the worst way to store a great amount of data that you need to manipulate often. All the more if you already have identified it as a bottleneck. – Tim Schmelter Dec 11 '14 at 08:22
  • @Tim The question wants the entire file in managed memory. Clearly if you can get away without doing that then mapping can be powerful. – David Heffernan Dec 11 '14 at 08:28

3 Answers3

5

The bottleneck for all of the variants that you list will be the I/O. Any method to read a complete file from disk into memory will meet the same bottleneck.

Therefore, it is reasonable to conclude that no alternative approach will yield significant gains. For sure you will find slight differences in performance between these and other methods. But you are never going to see orders of magnitude gains.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • This means the methods presented are already the fastest & performant in .NET (or even thrid party solutions) to read a text-file? Just asking to be sure. Because I was wondering why something like Notepad++ can read such files faster.. :)? – Yves Schelpe Dec 11 '14 at 08:08
  • @YvesSchelpe I suspect Notepad++ is not reading the entirety of the file into memory in one go. It is reading chunks as needed. – Mike Zboray Dec 11 '14 at 08:11
  • Valid point indeed, I need it in one go in my case. But I'll accept that this is just a hardware limit, as I suspected, but I thought it couldn't harm to ask. – Yves Schelpe Dec 11 '14 at 08:12
  • Does Notepad++ read the entire file into memory and then display it? Or does it take a different approach? You also need to beware of the disk cache which can confuse timings. It's always way quicker to read a file if it's in cache. Finally, it is plausible that Notepad++ won't have to convert from 8 bit encoded text to 16 bit as happens in .net. – David Heffernan Dec 11 '14 at 08:13
  • @DavidHeffernan indeed. With further testing (on my machine only), notepad++ wasn't all that faster to be fair. I guess, when reading a full file, the options listed in my question are the ones that will have to do the trick. – Yves Schelpe Dec 11 '14 at 08:21
1

I found this article to the topic and it might be interesting for you.

The article states that:

  • Reading each line into a string(buffered or unbuffered) is almost always faster than reading the whole text at once and is almost always faster than using a stringbuilder.

  • Lots of people state, that using a BufferedReader would always be the fastest way, which is somewhat wrong according to his tests. I also have a good experience in using a BufferedReader but that is just a feeling, his tests show that it is not always the fastest way, for more informations checkout the article.

You will find example code and test results of 9 different techniques of reading a textfile in the article, even if this does not show you "the fastest way" it might be interesting and helpful for you.

Paul Weiland
  • 727
  • 10
  • 24
0
File.ReadAllLines()

Gives faster performance but its depends upon machine configuration and file size. Please see link for good comparision http://cc.davelozinski.com/c-sharp/fastest-way-to-read-text-files

Mahesh Malpani
  • 1,782
  • 16
  • 27
  • 1
    I posted and read the same article and indeed it was not the "best" technique, reading line per line is almost always faster than reading the whole text at once. – Paul Weiland Dec 11 '14 at 08:18
  • In my case it didn't turn out to be a winner either, just as in the article as @DavidHeffernan pointed out. The article itself was an interesting read though, it kind of answered that there's no other alternatives to what I tried out already :). – Yves Schelpe Dec 11 '14 at 08:18
  • @MeAndSomeRandoms I know, but I need the whole file before I can start applying logic to the contents.. Line per line is no option, otherwise indeed.. – Yves Schelpe Dec 11 '14 at 08:20
  • Just read the followup [article](http://cc.davelozinski.com/c-sharp/the-fastest-way-to-read-and-process-text-files) where reading AND processing files is tested. Here we have 2 clear winners and you might want to consider using ReadAllLines() for reading and a parallel for loop for processing. Mahesh Malpani was somewhat right with his above post ! – Paul Weiland Dec 11 '14 at 08:25