1

I am getting a segmentation fault while running my executable which was built with -xSSE4.1 compilation-flag. I am running it on a machine which supports SSE4.1, SSE4.2 and AVX.

The intrinsics which is giving segmentation fault:

m_best_cost_0        = _mm_loadu_si128((__m128i *) &ps_mv_refine_ctxt->i2_tot_cost[0][0]);
m_2nd_best_cost_0    = _mm_loadu_si128((__m128i *) &ps_mv_refine_ctxt->i2_tot_cost[1][0]);

m_best_mv_cost_0     = _mm_loadu_si128((__m128i *) &ps_mv_refine_ctxt->i2_mv_cost[0][0]);
m_2nd_best_mv_cost_0 = _mm_loadu_si128((__m128i *) &ps_mv_refine_ctxt->i2_mv_cost[1][0]);

Structure definitions:

typedef struct
{
    __declspec(align(16)) WORD16 i2_tot_cost[2][TOT_NUM_PARTS];
    __declspec(align(16)) WORD16 i2_mv_cost[2][TOT_NUM_PARTS];
}mv_refine_ctxt;

And TOT_NUM_PARTS is an enum with value 17 and WORD16 enum for short int.

The "layout asm" command in gdb shows the intrinsics code segment getting translated to:

1   ¦0x5cbfec <hme_calc_sad_and_result_num_part_lt_9+108>    mov    0x98(%rdi),%r11                           ¦
2   ¦0x5cbff3 <hme_calc_sad_and_result_num_part_lt_9+115>    mov    0x20(%r12),%r14                           ¦
3   ¦0x5cbff8 <hme_calc_sad_and_result_num_part_lt_9+120>    movslq 0x14(%r12),%rdx                           ¦
4   ¦0x5cbffd <hme_calc_sad_and_result_num_part_lt_9+125>    movslq %eax,%rax                                 ¦
5   ¦0x5cc000 <hme_calc_sad_and_result_num_part_lt_9+128>    movslq %r15d,%r15                                ¦
6   ¦0x5cc003 <hme_calc_sad_and_result_num_part_lt_9+131>    movdqa (%r10),%xmm2                              ¦
7  >¦0x5cc008 <hme_calc_sad_and_result_num_part_lt_9+136>    movdqa 0x22(%r10),%xmm7                          ¦
8   ¦0x5cc00e <hme_calc_sad_and_result_num_part_lt_9+142>    movdqu 0x50(%r10),%xmm11                         ¦
9   ¦0x5cc014 <hme_calc_sad_and_result_num_part_lt_9+148>    movdqu 0x72(%r10),%xmm6 

The line7:

0x5cc008 <hme_calc_sad_and_result_num_part_lt_9+136> movdqa 0x22(%r10),%xmm7

is giving me a segmentation fault.

Value of $r10 is 0x7fffd4f22fe0 just before executing line 6; which is a 16-byte aligned address and hence line 6 executes without problem. But line 7 is loading from (0x7fffd4f22fe0 + 0x22) address which will not be 16-byte aligned. Also gdb prints the value of &ps_mv_refine_ctxt->i2_tot_cost[0][0] as 0x7fffd4f22fe0 which is $r10.

My concern is why the first two _mm_loadu_si128 are getting translated to movdqa when it should have been movdqu.

Same executable when I build with -xSSE4.2 or -xAVX compilation-flag, all four _mm_loadu_si128 instructions are getting translated to movdqu and hence no problem there. Also the issue is only visible with release-build (-O3 flag).

I am using Intel C/C++ compiler. I feel the __declspec(align(16)) is unnecessary in structure definitions. But it will not cause this weird behavior (tested with removing it and found no change in behavior).

Sorry for the long post. Thought these details were necessary.

PCoder
  • 2,165
  • 3
  • 23
  • 32
MediocreMyna
  • 269
  • 1
  • 5
  • 12
  • The problem is getting resolved if I declare my arrays as __declspec(align(16)) WORD16 i2_tot_cost[2][TOT_NUM_PARTS +7] – MediocreMyna Apr 25 '14 at 14:31

0 Answers0