0

I have the following code in C++. Pointers _p_s1 and _p_s2 are pointing to slices (every second video lines) in a bigger memory area holding a video frame (let's call this *pFrameData). Whenever data changes in the memory area pointed by pFrameData, this code works as expected.

However, If it so happens that I need to change the pFrameData, then this code crashes with an error:

Exception thrown: read access violation.

p2 was 0xFFFFFFFFFFFFFFFF.

I have a scalar version of the code and it works fine during the changes. This makes me think that the SSE registers(?) somehow retains pointers to the memory area pointed by the old pFrameData and since it is free'd - it crashes.

Is there a way I can solve this? I'm running this code in X64 environment.

void Merge8BitSSE2(uint8_t *_p_dest, const uint8_t *_p_s1, const uint8_t *_p_s2,
size_t i_bytes)
{

for (; i_bytes > 0 && ((uintptr_t)_p_s1 & 15); i_bytes--)
    *_p_dest++ = (*_p_s1++ + *_p_s2++) >> 1;

for (; i_bytes >= 16; i_bytes -= 16)
{
    __m128i xmm;
    __m128i *adst = (__m128i*)_p_dest;
    __m128i *p1 = (__m128i*)_p_s1;
    __m128i *p2 = (__m128i*)_p_s2;

    xmm = _mm_loadu_si128(p1);
    xmm = _mm_avg_epu8(xmm, *p2);
    *adst = _mm_loadu_si128(&xmm);

    _p_dest += 16;
    _p_s1 += 16;
    _p_s2 += 16;

}

for (; i_bytes > 0; i_bytes--)
    *_p_dest++ = (*_p_s1++ + *_p_s2++) >> 1;
}
Community
  • 1
  • 1
jpou
  • 1,935
  • 2
  • 21
  • 30
  • 1
    Are you sure that `p2` aligned properly? Can you try to load it the same way you load `p1`? – Michael Nastenko Jul 24 '18 at 14:22
  • 3
    Use `__m128i tmp = _mm_loadu_si128( (const __m128i*)_p_s2)` if it's potentially unaligned. And don't use `loadu(&xmm)`, that's nonsense. Use `_mm_store_si128((__m128i*)_p_dest, avg)` like a normal person (or storeu if your destination isn't aligned). – Peter Cordes Jul 24 '18 at 14:53
  • 2
    @PeterCordes __m128i xmm1 = _mm_loadu_si128((__m128i*)_p_s1); __m128i xmm2 = _mm_loadu_si128((__m128i*)_p_s2); __m128i adsr = _mm_avg_epu8(xmm1, xmm2); _mm_store_si128((__m128i*)_p_dest, adsr); Worked perfectly. If you put this as an answer - I'll mark it as correct. – jpou Jul 25 '18 at 06:41

0 Answers0