Both of these need to be understood in the context of a CUDA warp. All operations are issued warp-wide, and this includes instructions that access memory.
An individual CUDA thread can access 1,2,4,8,or 16 bytes in a single instruction or transaction. When considered warp-wide, that translates to 32 bytes all the way up to 512 bytes. The GPU memory controller can typically issue requests to memory in granularities of 32 bytes, up to 128 bytes. Larger requests (say, 512 bytes, considered warp wide) will get issued via multiple "transactions" of typically no more than 128 bytes.
Modern DRAM memory has the design characteristic that you don't typically ask for a single byte, you request a "segment" typically of 32 bytes at a time for typical GPU designs. The division of memory into segments is fixed at design time. As a result, you can request either the first 32 bytes (the first segment) or the second 32 bytes (the second segment). You cannot request bytes 16-47 for example. This is all a function of the DRAM design, but it manifests in terms of memory behavior.
The diagram(s) depicts the behavior of each thread in a warp. Individually, they are depicted by the gray/black arrows pointing upwards. Each arrow represents the request from a thread, and the arrow points to a relative location in memory that that thread would like to load or store.
The diagrams are presented in comparison to each other to show the effect of "alignment". When considered warp-wide, if all 32 threads are requesting bytes of data that belong to a single segment, this would require the memory controller to retrieve only one segment to satisfy the request. This would arguably be the most efficient possible behavior (and therefore data organization as well as access pattern, considered warp-wide) for a single request (i.e. a single load or store instruction).
However if the addresses emanating from each thread in the warp result in a pattern depicted in the 2nd figure, this would be "unaligned", and even though you are effectively asking for a similar data "footprint", the lack of alignment to a single segment means the memory controller will need to retrieve 2 segments from memory to satisfy the request.
That is the key point of understanding associated with the figure. But there is more to the story than that. Misaligned access is not necessarily as tragic (performance cut in half) as this might suggest. The GPU caches play a role here, when we consider these transactions not just in the context of a single warp, but across many warps.
To get a more complete and orderly treatment of these topics, I suggest referring to various training material. It's by no means the only one, but unit 4 of this training series will cover the topic in more detail.