1

The THUMB2 reference specifies that LDR PC, [PC, #imm] (type 2) is unpredictable if the target address is not 4-byte aligned.

From my experience, on some processors this works perfectly fine, and on others it fails miserably (which is why it took me quite a while to trace the fault to this alignment issue).

So I was wondering if there's some real explanation for this (beyond "just don't do it").

Dany Zatuchna
  • 1,015
  • 1
  • 10
  • 13
  • 1
    If it is not 4 byte aligned, the memory access might take multiple cycles **even if possible**. Esp if the system has no cache (cortex-m, which only support thumb2). It is fairly easy to pad to 4 byte align PIC data. So some ARM/Thumb2 systems maybe designed without the ability to do unaligned memory access. The transistor count and pipeline will be simplified. The thumb2 assembly technical are written to handle all cases and give the CPU/SOC designers some flexibility. – artless noise Nov 01 '16 at 13:24
  • Thanks for the quick response, but I don't quite get it yet. Why is it fine for the address to be unaligned if the destination register is not `pc`? Also, can you recommend some reading material about the inner workings of ARM? It might be just that I'm missing some information. Just to emphasize, I'm not looking for a solution (as that's fairly straightforward), I just want to understand the problem. – Dany Zatuchna Nov 01 '16 at 16:58
  • The `PC` is special as it requires where the next instruction will be fetched. If the `ldr` is unaligned a multiple bus/memory cycle would/could be required and you would need something to tell the CPU "I don't know where we are; stop everything" which is logic/transistors/complexity. – artless noise Nov 02 '16 at 00:43

2 Answers2

2

With ARM language like that often means that at some point in the past or present they have a specific core where it doesnt work. So just dont do it. May work perfectly well with your core. It may or may not have anything to do with the instruction set, they can always make that instruction work if they wanted to, aligned or not, just a matter of putting the gates down. Which is why it is mostly likely that one or more specific implementations have a problem and were already released before it was found.

In the old days with ARM, and may still be true, that they put this language in when they have specifically implemented something that is in fact predictable, and they use it as a way to see if you are using stolen code or whatever. To cover the what if you cloned an ARM kind of thing. I think picoTurbo pretty much covered that and put that to bed. ARM's legal team makes short work of that now.

The program counter in particular is a bit messy, esp with pipelines, the two ahead thing is all synthesized now has probably been since acorn days. Just a bad idea in general to use the pc on the right side of the comma except for specific cases (pc relative loads, jump tables, etc), so you may see that kind of language with respect to the PC simply so they dont have to add the code and clock cycles to make that instruction just work with the pc on the right. In this case (pc relative load), again they probably have one or more implementations cut and pasted from each other that have a problem, or for performance or gate count or timing closure reasons, they made this rule. Timing closure, your design can only run as fast as the longest pole in the tent, the longest, time-wise, combinational signal takes to settle covering variations in manufacturing and temperature and other environmental factors plus margin. So before tape in you compute these, examine them and decide, do we want to split this into two or more clocks, is it tied to a specific feature, do we want to just remove that feature. Repeat synthesis and timing closure until your expected max clock rate is at or above what you expected for this product.

It could also be that they didnt trap the unaligned access in this case, not 4 byte aligned is an unaligned access, and they may not have properly implemented it assuming it would be trapped or who knows why. You can maybe try to test that. By taking or planting specific bytes on either side of the unaligned address, and then planting code at combinations of where that might land. Unless you are a chip vendor you cant see this otherwise necessarily (if it doesnt trap), as a chip vendor you would be able to sim this and see exactly what is happening, of course you would have the code as well and see exactly why it doesnt work if you have one that doesnt work.

Looking in the early ARM ARM (ARMv4T/ARMv5T and some ARMV6), it is even more generic on the LDR , [, ]

If the memory address is not word-aligned and no data abort occurs, the value written to the destination register is UNPREDICTABLE.

Doesnt even get into using the PC as one or more of the registers.

TL;DR. Highly likely it is one of two things, 1) they have at least one core that has a bug, fixed in later cores of the same family or other designs. 2) They have a design reason (often timing/performance) that made it undesirable to implement the unaligned access and allow it to produce garbage, likely not unpredictable, but not worth the lengthy explanation of what the result is as it doesnt help you anyway.

Just because it worked on one core one time for you doesnt mean that it always works you could be getting lucky with the code and core in question. If you have access to the errata you may find your answer both the why and the fix. Thumb is supported on all arm cores from ARMv4T to the present, and many of those cores are do overs from scratch so just because you find it in one errata with a fix doesnt mean that other designs relied on the documentation saying dont do this and didnt bother to make it work.

Oded
  • 489,969
  • 99
  • 883
  • 1,009
old_timer
  • 69,149
  • 8
  • 89
  • 168
1

The main reason (I think) is that instructions that load PC or SP have side effects and are difficult to manage (efficiently) in the CPU. Since ARM instruction set, the newer instructions set (including Aarch64) restrict the instructions that have these side effects.

Dric512
  • 3,525
  • 1
  • 20
  • 27