I'm writing a program using Intel AVX2 instructions. I found a bug in my program which appears only with optimization level -O2 or higher (With -O1 it's good). After extensive debugging, I narrow down the buggy region. Now the bug seems to be caused by the compiler incorrect optimizing out a simple copy assignment of __m256i variable.
Consider the following code snippet. Foo is a templated function. I test with CMP = kLess, OPT=kSet
. I'm aware that the optimizer will probably optimize out the switches. It may even optimize out the variable y
.
The buggy line is y = m_lt;
. When compiled with -O2, this line seems to be ignored. Then y
doesn't get the right value and the program generates wrong result. However the program is correct with -O1.
To verify my judgement, I replace y = m_lt;
with two alternatives:
y = avx_or(m_lt, avx_zero());
takes bitwise OR of m_lt
and an all-0's vector
y = _mm256_load_si256(&m_lt);
use the SIMD load instruction to load data from the address of m_lt
.
Both should be semantically equivalent to y = m_lt;
My intention is to prevent some optimization by adding some functions. The program works correctly with these two replacements under all optimization levels. So the problem is weird. To my knowledge, direct assignment of SIMD variables is definitely okay (I used a lot before). Will it be the problem related to the compiler?
typedef __m256i AvxUnit;
template <Comparator CMP, Bitwise OPT>
void Foo(){
AvxUnit m_lt;
//...
assert(!avx_iszero(m_lt)); //always pass
AvxUnit y;
switch(CMP){
case Comparator::kEqual:
y = m_eq;
break;
case Comparator::kInequal:
y = avx_not(m_eq);
break;
case Comparator::kLess:
y = m_lt; //**********Bug?*************
//y = avx_or(m_lt, avx_zero()); //Replace with this line is good.
//y = _mm256_load_si256(&m_lt); //Replace with this line is good too.
break;
case Comparator::kGreater:
y = m_gt;
break;
case Comparator::kLessEqual:
y = avx_or(m_lt, m_eq);
break;
case Comparator::kGreaterEqual:
y = avx_or(m_gt, m_eq);
break;
}
switch(OPT){
case Bitwise::kSet:
break;
case Bitwise::kAnd:
y = avx_and(y, bvblock->GetAvxUnit(bv_word_id));
break;
case Bitwise::kOr:
y = avx_or(y, bvblock->GetAvxUnit(bv_word_id));
break;
}
assert(!avx_iszero(y)); //pass with -O1, fail with -O2 or higher
bvblock->SetAvxUnit(y, bv_word_id);
//...
}