2

I have 2 different implementations of a 64bit add in HLSL. If I want to set A += B, where al, ah, bl, and bh are the low and high 32 bits of A and B respectively, then I do either

(1):

#define pluseq64(al, ah, bl, bh) do {\
    uint tadd0 = al >> 1;\
    uint tadd1 = bl >> 1;\
    tadd0 += al & bl & 0x00000001;\
    tadd0 += tadd1;\
    tadd0 >>= 31;\
    al += bl;\
    ah += bh;\
    ah += tadd0;

or (2):

#define pluseq64(al, ah, bl, bh) do {\
    uint t = al;\
    al += bl;\
    ah += bh;\
    if (al < t) { \
        ah += 1; \
    } } while(0)

Now, interestingly enough, (1) always produces the correct output, whereas (2) does not. Given that (1) is kind of a mess of operations (3 shifts, 5 adds to do a single 64bit +=), I'd much prefer something along the lines of (2) to (1), except that (2) doesn't work properly.

As an alternative to (2), I've tried:

#define pluseq64(al, ah, bl, bh) do {\
    uint t = al;\
    al += bl;\
    ah += bh;\
    ah += (al < t); } while(0)

Which doesn't quite work either (for likely the same reason, whatever that reason is, if I have my guess).

Why doesn't (2) work properly? Bonus: is there a better way to do a 64bit add in HLSL?

Thank you!

MNagy
  • 423
  • 7
  • 20
  • 1
    What was the input that it went wrong on? – harold Jul 27 '15 at 22:56
  • It may take me a bit of time to find exactly where it deviates (no breakpoint ability inside of the GPU, blah). Interestingly, when I put in some test cases where the carry bit would be present, both versions produced the correct output... but the issue remains that the cumulative output is different when I use version 1 as opposed to version 2. It's just so bizarre. – MNagy Jul 27 '15 at 23:39

2 Answers2

1

In my testing, the three seem to produce equivalent output on C++, so this is kind of odd. Did you do CPU side testing and did it work for you there? One thing you could try is to skip the macro & do/while stuff and see if it works with a simple HLSL function:

void pluseq64(inout uint al, inout uint ah, in bl, in bh)
{
    uint t = al;
    al += bl;
    ah += bh;
    if (al < t)
    {
        ah += 1;
    }
    // or "ah += uint(al < t); 
}

Functions are inlined in HLSL anyway so I don't think you gain anything from using preprocessor directives.

Arttu Peltonen
  • 1,250
  • 14
  • 18
0

Perhaps your snippet manifested an older driver bug? Stepping through the disassembly with PIX could help. I've used the following without issue on Nvidia/AMD/Intel, which is basically equivalent to your (1).

struct uint64_emulated
{
    uint32_t low;
    uint32_t high;
}

inline uint64_emulated Add(uint64_emulated a, uint64_emulated b)
{
    uint64_emulated c;
    c.low = a.low + b.low;
    c.high = a.high + b.high + (c.low < a.low); // Add with carry.
    return c;
}

Dwayne Robinson
  • 2,034
  • 1
  • 24
  • 39