1

I have to compare for equality 16 bytes memory blocks in a very performance sensitive place.

The blocks are always perfectly aligned and they are always exactly 16 bytes. It seems to me that I should be able to utilize this knowledge and come up with something that works better than byte-by-byte comparison.

In fact I believe that most of the memcmp implementations do this, but obviously it will cost for it some time to analyze the addresses and the size. In my case it does not, and yet it is still faster:

...
mov    $0x10,%ecx
mov    -0x4c(%ebp),%esi
repz cmpsb %es:(%edi),%ds:(%esi)

I tried to optimize it with implementing 32 bits checks my self, but it does not perform better. Probably because memcmp utilize processor instructions, that my custom c++ code does not.

Any ideas is there something faster then memcmp for such a case?

gsf
  • 6,612
  • 7
  • 35
  • 64
  • 1
    See what your compiler generates. SSE4.2 has an instruction for comparing 16 byte strings. – Mysticial Dec 07 '14 at 04:55
  • compare to find what ? ( difference, bit difference, negation ? ) – iamgopal Dec 07 '14 at 04:56
  • 1
    Many compiler will inline memcmp. And if the size argument is a constant, the compiler could be pretty smart about it. – brian beuning Dec 07 '14 at 04:58
  • Why don't just use a memcmp? The good comilator will optimise this function as best as possible. – Dmitry Dec 07 '14 at 04:58
  • it is not just the size, the starting addresses should be also considered, seems strange the compiler to be able to do this analyse at compile time – gsf Dec 07 '14 at 05:01
  • 2
    Given that L1 cache read is ~3 cycles, L2 cache read ~20 cycles, memory read ~100 cycles. I think your time is better spent making sure the data is cached. – brian beuning Dec 07 '14 at 05:01
  • @Dmitry there is no reason not to use memcmp. But I am looking for better way - If I find, voala this is a reason to switch to it. – gsf Dec 07 '14 at 05:03

1 Answers1

0

You can try something like this, just to see what difference does it will make comparing with memcmp (assuming, you have a 64bit processor):

#define MY_CMP(B1, B2) (((int64_t *) (B1))[0] == (int64_t *) (B2))[0] && ((int64_t *) (B1))[1] == ((int64_t *) (B2))[1])

if (MY_CMP(array1, array2)) {
    // something
}

But if comilator is good, you shouldn't see any difference.

Dmitry
  • 2,069
  • 4
  • 15
  • 23