In CUDA, how can I determine whether my last integer arithmetic operation has overflowed/underflowed or not? Can I get the value of an overflow flag?
1 Answers
A partial answer, or what I've thought of so far:
Special Cases
These utilize some PTX instructions which are not (AFAICT) available directly in CUDA; you would need wrapper functions, implemented using inline PTX, to use them.
Signed 32-bit values
If you use both of the add.s32
and add.sat.s32
operators, or the sub.s32
and sub.sat.s32
operators, comparing the result tells you whether you overflowed or not. There's also fused multiply-add, which if done for 32-bit signed values has a mad.sat.s32
and a mad.lo.s32
which you could compare, if you want to check for overflow over 32-bits (which you might not quite consider overflow really). To better understand what lo
means in this context, read on.
Multiplication
For multiplication, overflow is "avoided" in PTX by assuming the result is twice as wide as the operands. To the PTX multiplication instructions mad
(actually, it's multiply-and-add) allow either getting just the high/low bits of the result, or if the operands are 16-bit or 32-bit wide, getting the entire double-width output. So you can just use mad.hi.yourtype
and make sure it's all-zeros (or all-ones for a negative-value multiplication).
A slow approach for the general case
A slow but general solution is to compare a rough estimate of the result to the actual result. Take addition for example. You would the higher half of the bits of both operands and add that up. That would indicate either "certainly overflow" if that itself overflows to the one-past-half bit, "certainly no overflow" if the result is so far from overflowing (or underflowing) that any values for the lower bits can't make it overflow, or "maybe overflow", in which case you just need to make sure that higher half of the result is close enough to the estimated higher half.
This is doable on any processor, but should really be avoided if you can do better.

- 118,144
- 57
- 340
- 684