-1

I am side-checking some code that was translated from C to C#. I have a question on the original C:

...
#define getblock(p, i) (p[i])
...
void MurmurHash3_x86_32 ( const void * key, int len,
                          uint32_t seed, void * out )
{
  const uint8_t * data = (const uint8_t*)key;
  const int nblocks = len / 4;
  int i;

  uint32_t h1 = seed;

  uint32_t c1 = 0xcc9e2d51;
  uint32_t c2 = 0x1b873593;

  const uint32_t * blocks = (const uint32_t *)(data + nblocks*4);

  for(i = -nblocks; i; i++)
  {
    uint32_t k1 = getblock(blocks,i);
...

The part for(i = -nblocks; i; i++) ... is this looping through the data backwards? I've never seen data referred to with a negative index.

IamIC
  • 17,747
  • 20
  • 91
  • 154
  • 1
    Why not debug it or just print out `i` at the start of each loop to see what happens? That should make the behavior pretty clear. – Servy Jun 10 '13 at 17:43
  • I am writing in C# under Visual Studio. I believe the source is GNU C. – IamIC Jun 10 '13 at 17:44
  • 3
    So you have no means at all of executing the original code? If you're doing a rewrite I'd find a way of fixing that; you should be able to run snippets of the code that you're translating to help you better understand what it's doing. Not being able to execute any C code at all is worse still, and shouldn't be too hard to remedy. – Servy Jun 10 '13 at 17:46
  • You're right. I was just wondering if there was a simple explanation of this function. – IamIC Jun 10 '13 at 17:47
  • There is, but my point is you should be able to figure this out on your own by executing it, even if you don't understand what it does just by reading it, or at the very least you should be able to confirm your guess as to what it does by running it. If you can't do that here it'll just make the less complex aspects of the translation that much harder to solve. – Servy Jun 10 '13 at 17:48
  • I agree, but actually this is the only part that was unclear. – IamIC Jun 10 '13 at 17:51
  • It uses negative values because the *blocks* pointer points to the end of the object. This is otherwise a *very* dangerous algorithm and quite unsuitable for managed code. It assumes that all of the bytes in the object are part of fields of the object. That isn't true in native code and not true (and undiscoverable) in managed code. It is only useful for simple native types, the .NET ones already have a very good hash algorithm. – Hans Passant Jun 10 '13 at 18:07
  • @HansPassant I will simply loop forwards in .Net, which is correct for the platform. – IamIC Jun 10 '13 at 18:10

3 Answers3

3

The blocks variable is initialized ahead of data by nblocks (assuming sizeof(uint32_t) == 4). The for loop then starts from the beginning of data up to the end pointed to by blocks, so negative indices are used. So, it is not looping through the data backwards, but forwards.

jxh
  • 69,070
  • 8
  • 110
  • 193
3

No it's not looping through the data backwards. It starts at the beginning of data, and indexes up.

As you can see, here the pointer "blocks" is advanced past "data" already. It points "nblocks" past the beginning of data.

const uint32_t * blocks = (const uint32_t *)(data + nblocks*4);

So, you need a negative index to get to the beginning of data (-nblocks). The start of data is precisely at "blocks[-nblocks]". The "for" loop simply starts there, and counts up.

for(i = -nblocks; i; i++)
Ziffusion
  • 8,779
  • 4
  • 29
  • 57
0

In fact it's an algorithm used hashing (https://en.wikipedia.org/wiki/MurmurHash), your source might be that one https://github.com/JeffBezanson/libsupport/blob/master/MurmurHash3.c ;)

gtonic
  • 2,295
  • 1
  • 24
  • 32
  • I have a different source, but I believe this algorithm is quite standard. The wiki article implies it's going forwards (I saw it earlier), but you're confirming it is going backwards, with the remainder being at the beginning of the input, correct? – IamIC Jun 10 '13 at 17:49
  • 3
    What the hell is the `#define` for? Why didn't they just say `p[i]`? – Robert Harvey Jun 10 '13 at 17:52
  • Interesting that the 128 bit hash runs backwards, but the 32 bit one runs forwards. – IamIC Jun 10 '13 at 17:53