A basic block is defined as a sequence of (non-jump) instructions ending with a jump (direct or indirect) instruction. The jump target address should be the start of another basic block. Consider I have the following assembly code :
106ac: ba00000f blt 106f0 <main+0xb8>
106b0: e3099410 movw r9, #37904 ; 0x9410
106b4: e3409001 movt r9, #1
106b8: e79f9009 ldr r9, [pc, r9]
106bc: e3a06000 mov r6, #0
106c0: e1a0a008 mov sl, r8
106c4: e30993fc movw r9, #37884 ; 0x93fc
106c8: e3409001 movt r9, #1
106cc: e79f9009 ldr r9, [pc, r9]
106d0: e5894000 str r4, [r9]
106d4: e7941105 ldr r1, [r4, r5, lsl #2]
106d8: e1a00007 mov r0, r7
106dc: e12fff31 blx r1
106e0: e0806006 add r6, r0, r6
106e4: e25aa001 subs sl, sl, #1
106e8: e287700d add r7, r7, #13
106ec: 1afffff4 bne 106c4 <main+0x8c>
106f0: e30993d0 movw r9, #37840 ; 0x93d0
106f4: e3409001 movt r9, #1
bb1
106a4: ...
106ac: ba00000f blt 106f0 <main+0xb8>
The first basic block bb1 has a target address which is the start of bb4.
bb2
106b0: e3099410 movw r9, #37904 ; 0x9410
.... All other instructions
106c4: e30993fc movw r9, #37884 ; 0x93fc
.... All other instructions
106d8: e1a00007 mov r0, r7
106dc: e12fff31 blx r1
The second basic block bb2 has an indirect branch so the target address is known only at runtime.
bb3
106e0: e0806006 add r6, r0, r6
106e4: e25aa001 subs sl, sl, #1
106e8: e287700d add r7, r7, #13
106ec: 1afffff4 bne 106c4 <main+0x8c>
The third basic block has a target address which is not the start of another basic block but it is in the middle of bb2. According to the definition of a basic block, it is not possible. But, in practice, I am seeing this behavior (jumps in the middle of basic blocks) in multiple places. How to explain this behavior ? Is it possible to force a compiler (LLVM) to generate code that does not jump anywhere else except at the beginning of a basic block ?
bb4
106f0: e30993d0 movw r9, #37840 ; 0x93d0
106f4: e3409001 movt r9, #1
....
Ends with a branch (direct or indirect)
I am generating basic blocks using a tool (SPEDI) and the compiler used is LLVM (CLANG front end) and the targeted architecture is ARM V7 Cortex-A9.