Basic rule: data types should be native-aligned. Alignment should be the same as the bytes needed to store the type (rounded up to the power of 2), e.g.:
type size align (bytes)
char 1 1
short 2 2
int 4 4
float 4 4
int64_t 8 8
double 8 8
long double (x87, 80 bit) 10 16
_float128 16 16
int128_t 16 16
Some architectures, e.g. SPARC, prohibit data access if it is not aligned by 4 bytes, so a single char will have 4-byte alignment, and even on architectures that permit such behavior, it can be faster to access data stored with such alignment; thus, local variables on the stack and struct fields often have padding to achieve this if you have a mixture of differently-sized types, though that behavior can be altered if so desired.
The cache is faster with an alignment of more than just word size (not 32 and 64 bit, but at cache line size, e.g. 16 bytes or 32 bytes or 64 bytes).
Some wider instructions, like SSE2 (128bit wide) or double float (64bit wide) are faster (or will sometimes not work) for alignment of native width (if you need to load 128bit data, you should align it to 128 bits).
DMA and memory paging need even more alignment, but that is usually obtained by pointer manipulation.
OpenCL (GPGPU) sometimes needs huge alignment due very wide DDR buses and GPU core memory access limits: http://www.khronos.org/registry/cl/sdk/1.1/docs/man/xhtml/attributes-variables.html
/* a has alignment of 128 */
__attribute__((aligned(128))) struct A {int i;} a;