I have written a simple video conferencing app which uses multiple threads for video and audio mixing. I use libavcodec (ffmpeg) codecs for mixing video. As I know, libavcodec uses SSE instructions to achieve high performance. For audio mixing, I'm using a simple mixing algorithm which just adds the samples. I have written the adding algorithm with a sipmle for
loop in C++, but now I want to optimize it using SSE instructions like this:
__m128i* d = (__m128i*) pOutBuffer;
__m128i* s = (__m128i*) pInBuffer;
for (DWORD n = (DWORD)(nSizeToMix + 7) >> 3; n != 0; --n, ++d, ++s)
{
//Load data in SSE registers
__m128i xmm1 = _mm_load_si128(d);
__m128i xmm2 = _mm_load_si128(s);
//SSE2 sum
_mm_store_si128(d, _mm_add_epi16(xmm1, xmm2));
}
Audio mixing is done is a separate thread simultaneously with video mixing. When I use SSE instructions, the app crashes suddenly in a position unrelated to audio mixing, in encoding/decoding of video.
It seems because libavcodec uses SSE registers and instructions, my code conflicts with it. Is there any way to use SSE instructions without any conflicts with libvcodec (ffmpeg)? Any suggestions appreciated.