Does the AltiVec vec_ld() work only with 16-byte aligned variables?

Question

In gcc 4.1.2, vec_ld() does not work correctly on board of CPU MPC74XX.

float temp[4];
__vector float Src;
Src = (__vector float)vec_ld(0, temp);

However, if float variable is aligned to 16 bytes, it works correctly:

float temp[4] __attribute__((aligned(16)));

Is this by design?

score 7 · Answer 1 · answered Nov 29 '13 at 11:34

7

Yes, AltiVec loads and stores require 16 byte alignment. This is very well documented in the AltiVec manuals.

Unlike other SIMD architectures such as SSE however, note that AltiVec silently truncates unaligned addresses to the next lowest 16 byte boundary, rather than generating an exception, so your code will not crash, but it will not behave correctly if you attempt to load or store at an unaligned address.

In cases where you can not avoid unaligned loads you can load two adjacent aligned vectors and then use vec_lvsl + vec_perm to create the required vector:

float temp[4];
__vector float sr1, src2, src;

src1 = vec_ld(0, temp);
src2 = vec_ld(16, temp);
src = vec_perm(src1, src2, vec_lvsl(0, temp));

answered Nov 29 '13 at 11:34

Paul R

208,748
37
389
560

1

Is there any reason not to use `src2 = vec_ld(15, temp);`? It looks like it should do the same thing, while avoiding a potential page fault if `temp` *is* aligned (more of a problem when accessing the heap). – tc. Dec 26 '13 at 13:50
If temp is an array of 16 bytes. If the temp is at the end of the page. Will it not crash? – sunmoon Jan 23 '15 at 05:25
1

@sunmoon: yes, this is possible in some cases - mostly you can get away with it because there is usually valid memory after the data of interest, and so reading extra bytes doesn't cause a problem, but in general you should ensure that the second read will not fail. – Paul R Jan 23 '15 at 06:45
@PaulR - I f you have some time would you take a look at the POWER8 code in [SHA-Intrinsics](https://github.com/noloader/SHA-Intrinsics). We are not seeing the SHA performance/improvements we hoped for. Any suggestions to speed things up would be very welcomed. – jww Mar 11 '18 at 22:24
@jww: I wish I could help but I know almost nothing about the SHA stuff - I’m more of an image processing guy. – Paul R Mar 11 '18 at 22:27

Alexander Pozdneev · Answer 2 · 2014-06-10T07:03:56.863

2

By the way, in Power8 they finally added support for unaligned load/store vector access. For details, see information on lxvd2x / lxvw4x and stxvd2x / stxvw4x instructions in section "7.6 VSX Instruction Set" of Power ISA 2.07 document.

Those who have access to IBM XL C/C++ Compiler, could use vec_xld2() / vec_xlw4() and vec_xstd2() / vec_xstw4() intrinsics.

As of version "g++ (GCC) 4.10.0 20140419 (experimental)", I am not aware of GCC equivalents, but I believe, users of GCC could access unaligned memory by pointer dereferencing:

signed int *data;
// ...
vector signed int r = *(vector signed int *)&(data[i]);

edited Jun 10 '14 at 07:03

answered Jun 10 '14 at 06:42

Alexander Pozdneev

1,289
1
13
31

*"Power8 they finally added support for unaligned load/store..."* - I believe it was POWER7, not POWER8. Also see [`vec_xld2`](https://www.ibm.com/support/knowledgecenter/SSGH2K_12.1.0/com.ibm.xlc121.aix.doc/compiler_ref/vec_xld2.html), see [`vec_xlw4`](https://www.ibm.com/support/knowledgecenter/SSGH2K_12.1.0/com.ibm.xlc121.aix.doc/compiler_ref/vec_xlw4.html) and friends. – jww Mar 11 '18 at 22:20

Does the AltiVec vec_ld() work only with 16-byte aligned variables?

2 Answers2