3

I am writing an application to read and parse files which may be 1 KB to 200 MB in size.

I have to parse it two times...

  1. Extract an image contained in the file.

  2. Parse that image To extract the contents of the image.

I generally use the file stream, buffered stream, binary reader and binary writer to read and write the contents.

Now, I want to know the fastest and most efficient way to read the file and extract the contents...

Is there a good method or a good class library?

NOTE: Unsafe code is OK!

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Writwick
  • 2,133
  • 6
  • 23
  • 54
  • The biggest performance improvement here would be gained by parsing the file a single pass. this would avoid you scanning the image twice – undefined Apr 20 '12 at 01:52
  • @Luke Actually the image is contained in chunks and also some of the bytes in the image should be removed[recorded] before parsing. – Writwick Apr 20 '12 at 01:56
  • Yes, in terms of using using the .NET file objects, there shouldn't be much performance difference in terms of the raw speed that you're reading the file at. Is there some reason you're looking to optimize this? – David Z. Apr 20 '12 at 01:57
  • The file reading and writing is taking much time... an 81 mb file is taking about >30-40 secs to read and extract... so, I decided to optimize it so that the files are extracted in a more faster way. – Writwick Apr 20 '12 at 01:58
  • If it's taking 40 seconds to extract data from an 80 MB file, it's highly unlikely that file I/O speed is the problem. Some sample code that shows what you're doing would be very helpful. You might consider reading the entire file into memory, creating a `MemoryStream`, and connecting your `BinaryReader` to that. You can then profile your code and see exactly where the bottleneck is. – Jim Mischel Apr 20 '12 at 03:09

1 Answers1

14

The fastest and simplest way to read the file is simply:

var file = File.ReadAllBytes(fileName);

That will read the entire file as a byte array into memory. You can then go through it looking for what you need at memory array access speed (which is to say, extremely fast). This will almost certainly be faster than trying to process the file as you read it.

However, if this file will not comfortably fit in memory (and 81 MB will), then you will need to do this in chunks. If this is not needed, we can safely avoid that tricky discussion. The solutions in this case will be either:

  1. If using .NET 4.0, use memory mapped files (more in What are the advantages of memory-mapped files?).

  2. If not, you'll need to chunk read, cache and keep around what you think you'll need in memory (for efficiency) or re-reading it you simply can't keep it in memory. This can become messy and slow.

Community
  • 1
  • 1
yamen
  • 15,390
  • 3
  • 42
  • 52
  • 1
    Actually the Raw file contains HEADER and ACHUNK and BCHUNK [ACHUNK and BCHUNK are two types of blocks]image is containded by the BCHUNKs so I should read those chunks... I may be using the Memory Mapped File [ I had ideas for using it from before starting coding this Lib ] but I am not sure about the reliability of it. But now I may have to use it for increasing the performance. I am not marking it as answer but it really helped me. – Writwick Apr 20 '12 at 11:52
  • 1
    Well, you could vote up. Regardless, nothing about your question screams memory mapped files. `ReadAllBytes` will do everything you need and it's fast and simple. Anyway. – yamen Apr 20 '12 at 13:23
  • I cant vote up as i have below 15 reputation. and I should not use the `ReadAllBytes` as array functions are slower than FileStream Functions. – Writwick Apr 20 '12 at 17:01
  • 1
    OK, that is flat out not true (under the covers how do you think `ReadAllBytes` works?), and you can add a reader on both. If you are going to be reading the data twice, you need it in memory anyway. But all said and done, your call. – yamen Apr 20 '12 at 20:59