1

I'm trying to find a way to read a file into an array with "gaps":
So the read data is in the byte array buffer at the positions buffer[0], buffer[2], .., buffer[2*i], without any significant speed loss.

More specifically I want to read it int-wise (i.e. b[0], b[4], ..., b[i * 4]).

Is that in any way possible (C#, C++) or should I look for another approach?

A bit more background:
I'm trying to speed up a hash algorithm (hashes the file blockwise, concats blockhashes, hashes it, and takes the resulting hash).
The idea is/was to take SSE3 and do 4 blocks in "parallel", which is why I need the data in that way, so I can easily load the data into the registers.

The (pinvokable) lib I wrote in C++ gives nice results (i.e. 4 times as fast), but reordering the data eats up the speed gains.

Currently I'm reading the file blockwise and then reorder the ints (C#):

unsafe {
    uint* b = (uint*)buffer.ToPointer() + chunkIndex;
    fixed(byte* blockPtr = chunk) {
        uint* blockIntPtr = (uint*)blockPtr;

        for(int i = 0; i < 9500 * 1024 / 4; i += 4) {
            *(b + 00) = blockIntPtr[i + 0];
            *(b + 04) = blockIntPtr[i + 1];
            *(b + 08) = blockIntPtr[i + 2];
            *(b + 12) = blockIntPtr[i + 3];
            b += 16;
        }
    }
}

chunk is a byte array and chunkIndex is an int, passed as methods parameters.
buffer is a uint32_t* pointer which is allocated by my C++ code.

The problem with this is that it takes too long. Calling the above code 4 times takes around 90ms while the hashing takes 3ms.
The big discrepancy strikes me as a bit odd, but it produces correct hashes.

Arokh
  • 614
  • 1
  • 10
  • 18

1 Answers1

0

in c++ I would do something like:

uint* b = (uint*)buffer;
 for(int i = 0; i < 9500 * 1024; i ++) {
       //read 4 ints
            *(b+i+0)  = blockIntPtr[i + 0];
            *(b+i+1)  = blockIntPtr[i + 1];
            *(b+i+2)  = blockIntPtr[i + 2];
            *(b+i+3)  = blockIntPtr[i + 3];
      //skip next 12 ints
            b += 16;
   }
Pandrei
  • 4,843
  • 3
  • 27
  • 44
  • Do you mean to say the whole reordering would be faster in C++? I just tested it and it takes as long as the C# equivalent. – Arokh Nov 21 '13 at 18:48
  • if optimization is your goal I think you are better off using unsigned longs instead of ints. read 1 long and skip the next 3; and you can do some loop unrolling if that helps. – Pandrei Nov 21 '13 at 22:05
  • Well what I asked for is something without needing to reorder the ints or something which doesn't incur significant speed loss. The reordering takes 30 times longer than the hashing which defeats the purpose. I tried unrolling it but I'm getting the same times. As for using longs, I can't since need the int boundaries (I made a mistake in the description there (will update shortly), but the code in my question does what I want albeit slowly). – Arokh Nov 21 '13 at 22:30