0

So I'm trying to read data and store it in an array as fast as possible and the fastest method I found of doing so was this.

var filePath = "data.dat";
FileStream fs = new FileStream(filePath, FileMode.Open);
bool[] buffer = new bool[fs.Length];

TimeSpan[] times = new TimeSpan[500000];
Stopwatch sw = new Stopwatch();
for (int r = 0; r < 500000; r++)
{
    sw.Start();

    int stackable = 0;
    int counter = 0;
    while ((stackable = fs.ReadByte()) != -1)
    {
        buffer[counter] = (stackable == 1);
        counter++;
    }

    sw.Stop();
    Console.WriteLine($"Elapsed: {sw.Elapsed}ms");
    times[r] = sw.Elapsed;
    sw.Reset();
}

Console.WriteLine($"Longest itteration: {times.Max()}ms");

which manages to read and process about 9000 bytes in < 3ms. The idea is to check each byte to see if it's either 1 or 0 (true or false) and store that in an array.

So my question is, is there a faster way of achieving this? What are some things to keep in mind when trying to process data fast, is it to make sure you're working with smaller data types so you don't allocate unecessary memory?

What the data looks like.

enter image description here

https://hatebin.com/dcldbvrbdm

JohnA
  • 564
  • 1
  • 5
  • 20
  • For review of working code, try [Code Review](https://codereview.stackexchange.com/). – Tu deschizi eu inchid Jan 19 '22 at 03:42
  • Note that if you decide to post to CR remove all that hand-written performance measurement code and perform one using well known tools - any profiler or library like https://github.com/dotnet/BenchmarkDotNet. Also for performance of I/O methods you probably would want to get lower level OS-specific tools to check what numbers to expect from your I/O subsystem - consider including that information into this or potential future CR question too so it is clear how far your code is from maximum speed. – Alexei Levenkov Jan 19 '22 at 03:49
  • Have you tried reading a larger buffer, e.g. 1024 bytes, and then processing the bytes? Have you considered [memory-mapped files](https://learn.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files)? – HABO Jan 19 '22 at 04:08

1 Answers1

1

Well, we are working with buffered IO so iterating by byte isn't that bad. But, reading data once (if you can) into a buffer is always faster - one IO. So below I used your code - had to add a seek(0) in the loop to reset the iteration.

In the next block I read all the data in and iterate using the new .AsSpan<>() - which is the new fast way to iterate an array.

using System;
using System.Diagnostics;
using System.IO;

namespace test_con
{
    class Program
    {
        static void Main(string[] args)
        {
            makedata();
            var filePath = "data.dat";
            var loop_cnt = 5000;
            using FileStream fs = new FileStream(filePath, FileMode.Open);
            bool[] buffer = new bool[fs.Length];
   
            Stopwatch sw = new Stopwatch();
            sw.Start();

            for (int r = 0; r < loop_cnt; r++)
            {
                int stackable = 0;
                int counter = 0;
                while ((stackable = fs.ReadByte()) != -1)
                {
                    buffer[counter] = (stackable == 1);
                    counter++;
                }
                fs.Seek(0, SeekOrigin.Begin);
            }

            Console.WriteLine($"avg iteration: {sw.Elapsed.TotalMilliseconds/loop_cnt}");

            var byte_buf = new byte[fs.Length];
            sw.Restart();

            for (int r = 0; r < loop_cnt; r++)
            {
                fs.Seek(0, SeekOrigin.Begin);
                fs.Read(byte_buf);
                int counter = 0;
                foreach(var b in byte_buf.AsSpan()) {
                    buffer[counter] = (b == 1);
                    counter++;
                }
            }

            Console.WriteLine($"buf avg iteration: {sw.Elapsed.TotalMilliseconds / loop_cnt}");
        }

        static void makedata()
        {
            var filePath = "data.dat";
            if (!File.Exists(filePath))
            {
                Random rnd = new Random();

                using FileStream fs = new FileStream(filePath, FileMode.CreateNew);
                for (int n = 0; n < 100000; n++)
                {
                    if (rnd.Next() % 1 == 1)
                        fs.WriteByte(0);
                    else
                        fs.WriteByte(1);
                }
            }
        }
    }
}

The output on my 2012 MacBook is:

avg iteration: 1.01832286
buf avg iteration: 0.6913623999999999

So buffer iteration is only about 70% of the stream iteration.

bmiller
  • 1,454
  • 1
  • 14
  • 14
  • Super useful information! Thank you! :-) – JohnA Jan 19 '22 at 05:40
  • @JohnA Try different size buffers for `byte_buf`. You may find that a buffer of 8k or 64k may be faster. Also you can use `Span.CopyTo` instead of copying manually – Charlieface Jan 19 '22 at 09:41
  • Ultimately there's still an IO in the loop, so I don't think there's much time to save anywhere else. The time taken to read the file dwarfs the time to iterate the buffer. – bmiller Jan 19 '22 at 17:08