An 8-byte aligned lookup-table simplifies indexing should be good for AVR chips that support lpm
- Load from Program Memory. (Optimized from @AterLux's answer). Aligning the table by 8 means all 8 entries have the same high byte of their address. And no wrapping of the low 3 bits so we can use ori
instead of having to negate the address for subi
. (adiw
only works for 0..63 so might not be able to represent an address.)
I'm showing the best-case scenario where you can conveniently generate the input in r30
(low half of Z) in the first place, otherwise you need a mov
. Also, this becomes too short to be worth calling a function so I'm not showing a ret
, just a code fragment.
Assumes input is valid (in 0..7); consider @ReAl's if you need to ignore high bits, or just andi r30, 0x7
If you can easily reload Z after this, or didn't need it preserved anyway, this is great. If clobbering Z sucks, you could consider building the table in RAM during initial startup (with a loop) so you could use X or Y for the pointer with a data load instead of lpm
. Or if your AVR doesn't support lpm
.
## gas / clang syntax
### Input: r30 = 0..7 bit position
### Clobbers: r31. (addr of a 256-byte chunk of program memory where you might have other tables)
### Result: r17 = 1 << r30
ldi r31, hi8(shl_lookup_table) // Same high byte for all table elements. Could be hoisted out of a loop
ori r30, lo8(shl_lookup_table) // Z = table | bitpos = &table[bitpos] because alignment
lpm r17, Z
.section .rodata
.p2align 3 // 8-byte alignment so low 3 bits of addresses match the input.
// ideally place it where it will be aligned by 256, and drop the ORI
// but .p2align 8 could waste up to 255 bytes of space! Use carefully
shl_lookup_table:
.byte 0x01, 0x02, 0x04, 0x08, 0x10, 0x20, 0x40, 0x80
If you can locate the table at a 256-byte alignment boundary, you can drop the lo8(table)
= 0 so you can drop the ori
and just use r30
directly as the low byte of the address.
Costs for the version with ori
, not including reloading Z
with something after, or worse saving/restoring Z
. (If Z is precious at the point you need this, consider a different strategy).
- size = 3 words code + 8 bytes (4 words) data = 7 words. (Plus up to 7 bytes of padding for alignment if you aren't careful about layout of program memory)
- cycles = 1(ldi) + 1(ori) + 3(lpm) = 5 cycles
In a loop, of if you need other data in the same 256B chunk of program memory, the ldi r31, hi8
can be hoisted / done only once.
If you can align the table by 256, that saves a word of code and a cycle of time. If you also hoist the ldi
out of the loop, that leave just the 3-cycle lpm
.
(Untested, I don't have an AVR toolchain other than clang -target avr
. I think GAS / clang want just normal symbol references, and handle the symbol * 2
internally. This does assemble successfully with clang -c -target avr -mmcu=atmega128 shl.s
, but disassembling the .o crashes llvm-objdump -d
10.0.0.)