I have the following code in C++. Pointers _p_s1 and _p_s2 are pointing to slices (every second video lines) in a bigger memory area holding a video frame (let's call this *pFrameData). Whenever data changes in the memory area pointed by pFrameData, this code works as expected.
However, If it so happens that I need to change the pFrameData, then this code crashes with an error:
Exception thrown: read access violation.
p2 was 0xFFFFFFFFFFFFFFFF.
I have a scalar version of the code and it works fine during the changes. This makes me think that the SSE registers(?) somehow retains pointers to the memory area pointed by the old pFrameData and since it is free'd - it crashes.
Is there a way I can solve this? I'm running this code in X64 environment.
void Merge8BitSSE2(uint8_t *_p_dest, const uint8_t *_p_s1, const uint8_t *_p_s2,
size_t i_bytes)
{
for (; i_bytes > 0 && ((uintptr_t)_p_s1 & 15); i_bytes--)
*_p_dest++ = (*_p_s1++ + *_p_s2++) >> 1;
for (; i_bytes >= 16; i_bytes -= 16)
{
__m128i xmm;
__m128i *adst = (__m128i*)_p_dest;
__m128i *p1 = (__m128i*)_p_s1;
__m128i *p2 = (__m128i*)_p_s2;
xmm = _mm_loadu_si128(p1);
xmm = _mm_avg_epu8(xmm, *p2);
*adst = _mm_loadu_si128(&xmm);
_p_dest += 16;
_p_s1 += 16;
_p_s2 += 16;
}
for (; i_bytes > 0; i_bytes--)
*_p_dest++ = (*_p_s1++ + *_p_s2++) >> 1;
}