1

My Android app is reporting SIGBUS errors on the 4th line in the following snippet (it's a function prologue):

MOV R12, SP
STMFD SP!, {R4-R12,LR,PC}
SUB R11, R12, #4
SUB SP, SP, #0xA4         <- SIGBUS here
STR R0, [R11,#var_30]
MOV R2, #0
MOV R0, #0
STR R0, [R11,#var_70]
STR R0, [R11,#var_68]
STR R0, [R11,#var_60]
STR R0, [R11,#var_5C]

Is that even possible?

Seva Alekseyev
  • 59,826
  • 25
  • 160
  • 281
  • Given that it's happened, it's entirely possible ;) There are a number of reasons you might get a fault _reported_ at a nonsensical address, and a couple for getting a fault at the address of a seemingly innocuous instruction, but without any context it's hard to say. What does the surrounding code look like? What are the exact addresses involved? What processor is it running on? – Notlikethat Dec 20 '14 at 10:36
  • 1
    Well, maybe it was misreported (how?) or my archive of builds is broken and I'm looking at the wrong SO file. – Seva Alekseyev Dec 20 '14 at 14:50

1 Answers1

1

Assuming you're not doing anything like trying to poke mmap'ed hardware addresses directly leading to the fun of an asynchronous external abort, that leaves me most suspicious of the stack push precisely two instructions (i.e. the uncorrected PC offset) beforehand. If you've somehow got nonsense into the SP, e.g. by popping a corrupted stack frame previously, circumstances could unfold thus:

  • The nonsense value in SP is not 4-byte aligned, so since the architecture doesn't allow unaligned load/store multiple, the STM results in an alignment fault (alignment faults are higher-priority than any other MMU fault).
  • Due to stupid legacy reasons* the kernel then tries to pretend there's no such thing as an alignment fault and goes off to emulate it using multiple 'safe' accesses.
  • The kernel then takes an MMU fault in the alignment handler trying to access the nonsense address. At that point, everything goes down the "just give up entirely" path back to userspace - you get the SIGBUS of the original alignment exception, but without the proper reporting (since the fixup never finished), and possibly conflated with artifacts of the 'secret' kernel-side page fault. Net result: confusion.

To check for this course of events, try first doing echo 5 > /proc/cpu/alignment (or programmatic equivalent) to disable the fixup and just report alignment faults properly - that really should be the default on modern kernels for hardware that does handle most unaligned accesses, but sadly it seems there's still too much bad software out there depending on this brokenness.

* namely network layer programmers too attached to undefined behaviour with type punning and structure packing which "works" on x86, and certain ancient versions of ARM GCC which would apparently happily generate unaligned LDM/STM even for valid code

Community
  • 1
  • 1
Notlikethat
  • 20,095
  • 3
  • 40
  • 77