-1

I have written a simple video conferencing app which uses multiple threads for video and audio mixing. I use libavcodec (ffmpeg) codecs for mixing video. As I know, libavcodec uses SSE instructions to achieve high performance. For audio mixing, I'm using a simple mixing algorithm which just adds the samples. I have written the adding algorithm with a sipmle for loop in C++, but now I want to optimize it using SSE instructions like this:

__m128i* d = (__m128i*) pOutBuffer;
__m128i* s = (__m128i*) pInBuffer;
for (DWORD n = (DWORD)(nSizeToMix + 7) >> 3; n != 0; --n, ++d, ++s)
{
    //Load data in SSE registers
    __m128i xmm1 = _mm_load_si128(d);
    __m128i xmm2 = _mm_load_si128(s);
    //SSE2 sum
    _mm_store_si128(d, _mm_add_epi16(xmm1, xmm2));
}

Audio mixing is done is a separate thread simultaneously with video mixing. When I use SSE instructions, the app crashes suddenly in a position unrelated to audio mixing, in encoding/decoding of video.

It seems because libavcodec uses SSE registers and instructions, my code conflicts with it. Is there any way to use SSE instructions without any conflicts with libvcodec (ffmpeg)? Any suggestions appreciated.

M.Mahdipour
  • 592
  • 1
  • 7
  • 21
  • 1
    you will only get a crash if there is a bug in your code, ffmpeg or your compiler. Registers should be saved and restored on thread context switches so threads should be completely independent. Your crashes sound like memory corruption, try running with asan enabled or valgrind – Alan Birtles Jan 03 '19 at 07:34
  • The downvotes are likely due to lack of a [mcve] – Alan Birtles Jan 03 '19 at 07:35
  • @AlanBirtles Does saving the registers is done automatically by CPU on context switch? My app does not crash when I use simple for loop for audio mixing. – M.Mahdipour Jan 03 '19 at 07:39
  • 1
    context switches are implemented in the os: https://wiki.osdev.org/Context_Switching. Are you sure your sse code isn't accessing outside the bounds of your buffers? e.g. are both your input and output buffers at least `16 * (nSizeToMix + 7) / 8` bytes long? Again we need a [mcve] – Alan Birtles Jan 03 '19 at 07:47
  • @AlanBirtles Thanks for your comments. I didn't know saving SSE registers is done automatically on context switching. Let me check my code again and report the result here. – M.Mahdipour Jan 03 '19 at 07:56
  • 1
    To avoid wraparound, you might want to use an average like `_mm_avg_epu16`. But that's an *unsigned* average, and audio data is normally signed. So maybe just add with saturation to clip instead of wrap: [`_mm_adds_epi16`](https://www.felixcloutier.com/x86/paddsb:paddsw). That's signed saturation; unsigned saturation is also available with `epu16`. – Peter Cordes Jan 03 '19 at 10:10
  • Are you sure your arrays have padding out to a multiple of 16 bytes? You're rounding *up* the length, rather than rounding down and using scalar for the left-over elements. (Or an unaligned final vector). – Peter Cordes Jan 03 '19 at 10:12
  • 1
    @PeterCordes You're correct. I checked my buffers. They were not long enough to hold a multiple of 16 bytes. I resolved my problem by changing size of my arrays to a multiple of 16 bytes. – M.Mahdipour Jan 15 '19 at 06:34

1 Answers1

0

Context switches should be OK as long as you're using modern compiler (newer than 10 years old) and you aren't coding in assembly. Compilers know the ABIs for their target platforms so you don't have to.

If you've included the exact code that crashed your app, the most likely reason is alignment issues. Replace _mm_load_si128 with _mm_loadu_si128, _mm_store_si128 with _mm_storeu_si128 and see if it helps.

Update 1: another possible reason is SSE version completes too fast and this triggers a concurrency bug. Try adding e.g. Sleep( 2 ) call after the loop, if video will work OK it means you need to fix the code that pushes or pulls the data across threads.

Update 2: As Alan pointed out, the size of arrays (buffers) may not be a multiple of 16 bytes (16 * (nSizeToMix + 7) / 8). This will surely cause your app crash or have memory corruptions.

M.Mahdipour
  • 592
  • 1
  • 7
  • 21
Soonts
  • 20,079
  • 9
  • 57
  • 130
  • 1
    I haven't done any sse for a while but as far as i can remember the instructions requiring aligned arguments crash immediately rather than causing corrupt memory? – Alan Birtles Jan 03 '19 at 08:11
  • Yeah, usually crashes. Unaligned access is the only issue immediately visible in the OP's code (I assume the buffer boundaries are good, as the OP already has non-SSE version running). – Soonts Jan 03 '19 at 08:25
  • Context switches are still transparent even if you are writing asm by hand, or using an old compiler. There's nothing you can do that will make the OS corrupt your XMM registers asynchronously. The OP says their crashes *aren't* in the mixing code. If alignment was the problem, the SIMD load or store would crash. – Peter Cordes Jan 03 '19 at 10:06