Deriving optimized strlen in assembler?

Question

The following code is able to figure out if a DWORD has one or more of its bytes set to 0.

mov eax, value
mov edx, 07EFEFEFFh
add edx, eax
xor eax, 0FFFFFFFFh
xor eax, edx
and eax, 081010100h

For example, if we input 34323331h, eax = 0 Yet if we input something where 1 byte is set to 00, such as 34003231h, eax != 0

I know what this code does, but I don't understand how it does it. How does this mathematically work? Can someone explain the process with bits to me and how this was derived?

It should be relatively simple, but I just can't see it

to check if a value contains a zero byte or not there are other faster ways: https://graphics.stanford.edu/~seander/bithacks.html#ZeroInWord — phuclv, May 15 '15 at 08:31

score 7 · Accepted Answer · answered Dec 25 '13 at 11:09

I'll count bits starting from 0 from right.

Short answer:

When you add 11111111 to zero byte (00000000), then the overflow bit (8th bit) does not differ from value + 0x7EFEFEFF's same overflow bit.

When you add 11111111 to non-zero byte, then the overflow bit (8th bit) does differ from value + 0x7EFEFEFF's same overflow bit.

The program just checks those bits.

Long answer:

This is the mathematical representation of the code (a is the value):

result = ((a + magic) ^ !a) & !magic

where

magic is 0x7EFEFEFF
^ means bitwise xor
& means bitwise and
! means bitwise reversed, aka xor'd with 0xFFFFFFFF

To understand the role of 0x7EFEFEFF, look at it's binary represenatation:

01111110 11111110 11111110 11111111

The 0's are magic overflow bits. These are bits number 8, 16, 24 and 31.

Let's look at a few examples.

Example 1: `eax = 0x00000000`

a         = 00000000 00000000 00000000 00000000
a+magic   = 01111110 11111110 11111110 11111111
!a        = 11111111 11111111 11111111 11111111

When we xor a+magic with !a we get:

result    = 10000001 00000001 00000001 00000000

Here look at the magic bits. They are all 1.

Then we simply clear the rest of the bits (which are all 0 here) by anding the result by 10000001 00000001 00000001 00000000 aka !magic. As you know, anding by 0 simply assigns 0 to that bit and anding by 1 does nothing to that bit.

Final result:

10000001 00000001 00000001 00000000

Example 2: `eax = 0x00000001`

a         = 00000000 00000000 00000000 00000001
a+magic   = 01111110 11111110 11111111 00000000
!a        = 11111111 11111111 11111111 11111110

When we xor a+magic with !a we get:

result    = 10000001 00000001 00000000 11111110

Look at the magic bits. Bits number 16, 24 and 31 are 1. 8th bit is 0.

8th bit represents the first byte. If the first byte is not zero, 8th bit becomes 1 at this point. Otherwise it's 0.
16th bit represents the second byte. Same logic.
24th bit represents the third byte.
31th bit represents the fourth byte.

Then again we clear non-magic bits by anding the result with !magic.

Final result:

10000001 00000001 00000000 00000000

Example 3: `eax = 0x34003231`

a         = 00110100 00000000 00110010 00110001
a+magic   = 10110010 11111111 00110001 00110000
!a        = 11001011 11111111 11001101 11001110

When we xor a+magic with !a we get:

result    = 01111001 00000000 11111100 11111110

Only 24th bit is 1

After clearing non-magic bits the final result is:

00000001 00000000 00000000 00000000

Example 4: `eax = 0x34323331`

a         = 00110100 00110010 00110011 00110001
a+magic   = 10110011 00110001 00110010 00110000
!a        = 11001011 11001101 11001100 11001110

When we xor a+magic with !a we get:

result    = 01111000 11111100 11111110 11111110

After clearing non-magic bits the final result is:

00000000 00000000 00000000 00000000 (zero)

I wrote a test case for demonstration:

#include <stdint.h> // uint32_t
#include <stdio.h> // printf

//assumes little endian
void printBits(size_t const size, void const * const ptr)
{
    unsigned char *b = (unsigned char*) ptr;
    unsigned char byte;
    int i, j;

    for (i = size - 1; i >= 0; i--) {
        for (j = 7; j >= 0; j--) {
            byte = b[i] & (1 << j);
            byte >>= j;
            printf("%u", byte);
        }

        printf(" ");
    }
}

int main()
{
    uint32_t a = 0;
    uint32_t d = 0;
    const uint32_t magic = 0x7EFEFEFF;
    const uint32_t magicRev = magic ^ 0xFFFFFFFF;

    const uint32_t numbers[] = {
        0x00000000, 0x00000001, 0x34003231,
        0x34323331, 0x01010101
    };


    for (int i = 0; i != sizeof(numbers) / sizeof(numbers[ 0 ]); i++) {
        a = numbers[ i ];
        d = magic;

        printf("a:            ");
        printBits(sizeof(a), &a);
        printf("\n");

        d = a + d;

        printf("a+magic:      ");
        printBits(sizeof(d), &d);
        printf("\n");

        a = a ^ 0xFFFFFFFF;

        printf("!a:           ");
        printBits(sizeof(a), &a);
        printf("\n");

        a = a ^ d;

        printf("result:       ");
        printBits(sizeof(a), &a);
        printf("\n");

        a = a & magicRev;

        printf("              ");
        printBits(sizeof(a), &a);

        if (a == 0) {
            printf(" (zero)\n");
        } else {
            printf(" (at least one)\n");
        }

        printf("\n");
    }

    return 0;
}

Deriving optimized strlen in assembler?

1 Answers1

Short answer:

Long answer:

Example 1: `eax = 0x00000000`

Example 2: `eax = 0x00000001`

Example 3: `eax = 0x34003231`

Example 4: `eax = 0x34323331`

Linked

Deriving optimized strlen in assembler?

1 Answers1

Short answer:

Long answer:

Example 1: eax = 0x00000000

Example 2: eax = 0x00000001

Example 3: eax = 0x34003231

Example 4: eax = 0x34323331

Linked

Example 1: `eax = 0x00000000`

Example 2: `eax = 0x00000001`

Example 3: `eax = 0x34003231`

Example 4: `eax = 0x34323331`