-3

I have a data array double[8,2000000] (4 rows and 2 million columns) and I want to save to file on disk every second. The format is text or BIN. I tried to use streamwriter to write file but it need more than 1 sec. File.write doesn't support the double format, thus I have to convert from double to others. And, it needs more than 1 sec. BinaryWriter is the same speed with streamwriter. The most important issue is the saving time must less than 1 sec.

Do you have any solution to help me? Thanks!

StreamWriter WriteFile = new StreamWriter(Folder_name + @"RecordFile\" + fileName + ".txt", true);
//record data
for (int i = 0; i < 2000000; i++)
{
    int j = 0;
    for (j = 0; j < 8; j++)
    {
        WriteFile.Write(dataWrite[j, i]);
        WriteFile.Write("\t");
    }
    WriteFile.WriteLine(dataWrite[j, i]);
}
WriteFile.Close();              //finish record
xanatos
  • 109,618
  • 12
  • 197
  • 280
Dong Thin
  • 11
  • 1
  • 8
    Show us your attempts – TheGeneral Jun 18 '18 at 08:16
  • As @TheGeneral mentioned, please post the code you have tried. Also, are you appending the data in existing file or writing a new file every 1 sec. Writing the file is IO operation so it would also depends what hw spec that is used to host the process? – user1672994 Jun 18 '18 at 08:18
  • 2
    Ok so you wanna dump a min of 64MB a second to disk. How is the array updated? Can you not just write changes? – BugFinder Jun 18 '18 at 08:21
  • 1
    As BugFinder wrote, it is 64mb of data. It is quite much. An SSD can do that kind of work. But a classical hdd can't normally. And even an SSD could have a problem sometimes and couldn't guarantee the speed (normally there are other processes that read/write) – xanatos Jun 18 '18 at 08:27
  • Here is my code. Create new file or add to existing file is not important. StreamWriter WriteFile = new StreamWriter(fileName + ".txt", true); for (int i = 0; i < 2000000; i++) { int j = 0; for (j = 0; j < 8; j++) { WriteFile.Write(dataWrite[j, i]); WriteFile.Write("\t"); } WriteFile.WriteLine(dataWrite[j, i]); } – Dong Thin Jun 18 '18 at 08:32
  • I am sorry, I don't know how to insert the total code here. I used 2 for loop to write each number in my data array. I try saving with new file and adding to existing file, but the time always more than 1 sec. – Dong Thin Jun 18 '18 at 08:39
  • Edit your question and just paste it in with each line prefixed with 4 spaces / a tab. Given how much you're writing, I'm not really surprised. The only faster way to do it would be to write the entire initial file in a way that each value can be randomly accessed, and then only update changes in the file when you write to it subsequently. – ProgrammingLlama Jun 18 '18 at 08:39
  • Also, what is your end goal? What are you writing this file for? – ProgrammingLlama Jun 18 '18 at 08:41
  • @BugFinder I have an SSD 1TB and I think the type of HDD is not a problem. I try to save double[2,2000000] with about 800ms but it not stable, sometime it is 1200ms. I have an equipment which send this data array to computer by usb port. I already received the data every sec. – Dong Thin Jun 18 '18 at 08:57
  • Another key point is does the format matter? are you sharing this data with other apps? – BugFinder Jun 18 '18 at 08:59
  • No, I received data from my equipment and I only want to save it to my disk. – Dong Thin Jun 18 '18 at 09:10

1 Answers1

4

For these benchmarks I used a scale, which was how large the FileSteam buffersize was to determine if there was any significant difference. I ran the tests 10 times each and average, in Release (64Bit) mode and .NET Framework 4.7.1 I used a randomized buffer new double[8, 2000000]; to generate the tests.

Results

Mode            : Release (64Bit)
Test Framework  : .NET Framework 4.7.1
Benchmarks Runs : 10 times (averaged)

Scale : 4,096, Test Data : Standard input
Value                     |    Average |    Fastest |   StDv |        Cycles | Pass |     Gain |
------------------------------------------------------------------------------------------------
WriteFile Unsafe          | 384.122 ms | 275.073 ms |  58.90 | 1,317,292,586 | Pass |  42.84 % |
BlockCopy                 | 389.389 ms | 305.094 ms |  57.68 | 1,335,451,612 | Pass |  42.05 % |
WriteFile Pinned          | 422.704 ms | 341.646 ms |  67.66 | 1,418,871,963 | Pass |  37.09 % |
BinaryWriter              | 671.966 ms | 608.900 ms |  58.63 | 2,260,807,206 | Base |   0.00 % |
BitConverter              | 784.722 ms | 668.788 ms | 139.98 | 2,607,901,414 | Pass | -16.78 % |


Scale : 32,768, Test Data : Standard input
Value                     |    Average |    Fastest |  StDv |        Cycles | Pass |    Gain |
----------------------------------------------------------------------------------------------
WriteFile Unsafe          |  97.254 ms |  88.318 ms |  5.38 |   339,330,780 | Pass | 83.49 % |
WriteFile Pinned          | 110.047 ms |  90.279 ms | 18.80 |   346,777,096 | Pass | 81.32 % |
BlockCopy                 | 115.805 ms | 106.119 ms |  7.40 |   403,209,891 | Pass | 80.34 % |
BinaryWriter              | 589.168 ms | 530.255 ms | 60.64 | 1,985,585,629 | Base |  0.00 % |
BitConverter              | 593.952 ms | 506.482 ms | 73.93 | 1,983,475,740 | Pass | -0.81 % |


Scale : 102,400, Test Data : Standard input
Value                     |    Average |    Fastest |  StDv |        Cycles | Pass |    Gain |
----------------------------------------------------------------------------------------------
WriteFile Unsafe          |  73.071 ms |  69.885 ms |  1.77 |   255,008,411 | Pass | 85.95 % |
WriteFile Pinned          |  73.523 ms |  71.073 ms |  1.98 |   256,062,331 | Pass | 85.86 % |
BlockCopy                 |  82.068 ms |  78.838 ms |  1.79 |   286,872,838 | Pass | 84.22 % |
BinaryWriter              | 519.943 ms | 471.578 ms | 46.01 | 1,778,713,946 | Base |  0.00 % |
BitConverter              | 559.842 ms | 497.743 ms | 39.83 | 1,946,616,118 | Pass | -7.67 % |


Scale : 1,048,576, Test Data : Standard input
Value                     |    Average |    Fastest |  StDv |        Cycles | Pass |     Gain |
-----------------------------------------------------------------------------------------------
WriteFile Pinned          |  59.993 ms |  56.088 ms |  1.73 |   209,025,613 | Pass |  87.46 % |
WriteFile Unsafe          |  61.783 ms |  56.266 ms |  8.09 |   206,988,059 | Pass |  87.08 % |
BlockCopy                 |  64.105 ms |  61.066 ms |  1.52 |   224,205,049 | Pass |  86.60 % |
BinaryWriter              | 478.376 ms | 442.570 ms | 34.63 | 1,671,203,569 | Base |   0.00 % |
BitConverter              | 550.557 ms | 493.186 ms | 42.27 | 1,916,031,041 | Pass | -15.09 % |

BlockCopy

private static void Write(double[,] ary, int chunkSize, string fileName)
{
   var h = ary.GetLength(0);
   var w = ary.GetLength(1);
   var totalSize = h * w * sizeof(double);

   using (var fs = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None, chunkSize))
   {
      var buffer = new byte[chunkSize];

      for (var i = 0; i < totalSize; i += chunkSize)
      {
         var size = Math.Min(chunkSize, totalSize - i);
         Buffer.BlockCopy(ary, i, buffer, 0, size);
         fs.Write(buffer, 0, size);
      }
   }
}

BinaryWriter

private static void Write(double[,] ary, int chunkSize, string fileName)
{
   var h = ary.GetLength(0);
   var w = ary.GetLength(1);

   using (var fs = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None, chunkSize))
      using (var bw = new BinaryWriter(fs))
         for (var i = 0; i < h; i++)
            for (var j = 0; j < w; j++)
                bw.Write(ary[i, j]);

}

WriteFile Pinned

private static unsafe void Write(double[,] ary, int chunkSize, string fileName)
{
   var h = ary.GetLength(0);
   var w = ary.GetLength(1);
   var totalSize = h * w;
   var s = chunkSize / sizeof(double);
   using (var fs = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None, chunkSize))
   {
      var handle = default(GCHandle);

      try
      {
         handle = GCHandle.Alloc(ary, GCHandleType.Pinned);
         var p = (long*)handle.AddrOfPinnedObject()
                              .ToPointer();
         var fileHandle = fs.SafeFileHandle.DangerousGetHandle();

         for (var i = 0; i < totalSize; i += s)
         {
            var size = Math.Min(s, totalSize - i);
            var p2 = p + i;
            Kernel32.WriteFile(fileHandle, (IntPtr)p2, size * sizeof(double), out var n, IntPtr.Zero);
         }
      }
      finally
      {
         if (handle.IsAllocated)
         {
            handle.Free();
         }
      }
   }
}

WriteFile Pinned unsafe

private static unsafe void Write(double[,] ary, int chunkSize, string fileName)
{
   var h = ary.GetLength(0);
   var w = ary.GetLength(1);

   var totalSize = h * w;
   var s = chunkSize / sizeof(double);
   using (var fs = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None, chunkSize))
   {
      var fileHandle = fs.SafeFileHandle.DangerousGetHandle();
      fixed (double* p = ary)
      {
         for (var i = 0; i < totalSize; i += s)
         {
            var size = Math.Min(s, totalSize - i);
            var p2 = p + i;
            Kernel32.WriteFile(fileHandle, (IntPtr)p2, size * sizeof(double), out var n, IntPtr.Zero);
         }
      }
   }
}

BitConverter

This just uses filestream and BitConverter.GetBytes

private static void Write(double[,] ary, int chunkSize, string fileName)
{
   var h = ary.GetLength(0);
   var w = ary.GetLength(1);

   using (var fs = new FileStream(fileName, FileMode.Create, FileAccess.Write, FileShare.None, chunkSize))
      for (var i = 0; i < h; i++)
         for (var j = 0; j < w; j++)
            fs.Write(BitConverter.GetBytes(ary[i, j]), 0, 8);

}

Summary

This was extremely fiddly to test and get right, however all the solutions are tested and they write the whole array continuously to file in raw bytes of a double.

At first xanatos's version using pinned array seemed really slow, which is not shown here, it took me a while to figure out what was actually going on. it turns out that writing the whole array to a file straight away seem to be the slowest. It maybe because the flushing isn't happening incrementally and happening when the file closes, I'm not sure but I suspect it may be trying to write it all at once.

However, when I tweaked this to write in chunks it turns out to be the most consistent. Once again this was really hard to test though, we are fighting against various caches that are not easy to overcome; in the end I had to write separate files and still it seems like the OS was caching results.

Update

If you want to read the data back you can use the following:

private static void Read(double[,] ary, int chunkSize, string fileName)
{
   var h = ary.GetLength(0);
   var w = ary.GetLength(1);
   var totalSize = h * w * sizeof(double);

   using (var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.None, chunkSize))
   {
      var buffer = new byte[chunkSize];

      for (var i = 0; i < totalSize; i += chunkSize)
      {
         var size = Math.Min(chunkSize, totalSize - i);
  
         fs.Read(buffer, 0, size);
         Buffer.BlockCopy(buffer,0,ary , i, size);
      }
   }
}
halfer
  • 19,824
  • 17
  • 99
  • 186
TheGeneral
  • 79,002
  • 9
  • 103
  • 141