I'm building a small bytecode VM that will run on a variety of platforms including exotic embedded and microcontroller environments.
Each opcode in my VM can be variable length(no more than 4 bytes, no less than 1 byte). In interpreting the opcodes, I want to create a tiny "cache" for the current opcode. However, due to it being used on many different platforms, it's hard to do.
So, here is a few examples of expected behavior:
- On an 8-bit microcontroller with an 8-bit memory bus, I'd want it to only load 1 byte because it'd take multiple (slow) memory operations to load anymore, and in theory, it might only require 1 byte to execute the current opcode
- On an 8086(16-bit), I'd want to load 2 bytes because to only load 1 byte we would basically be throwing some useful data away to be read later, but I don't want to load more than 2 bytes because it'd take multiple operations
- On a 32-bit ARM processor, I'd want to load 4 bytes because otherwise we're either throwing data that'd might have to be read again away, or we're doing multiple operations
I would say this could be handled easily by just assuming that unsigned int
is good enough, but on 8-bit AVR microcontrollers, int is defined as 16-bit, but the memory data bus width is only 8 bit, so 2 memory load operations would be required.
Anyway, current ideas:
using uint_fast16_t
seems to work as expected on most platforms (32 bits on ARM, 16 bits on 8086, 64 bits on x86-64). However, it clearly still leaves out AVR and other 8-bit microcontrollers.
I thought using uint_fast8_t
might work, but it would appear on most platforms that it's defined as being unsigned char
, which definitely isn't optimal
Also, there is another problem that must be solved as well: unaligned memory access. On x86, this probably isn't going to be a problem(in theory it does 2 memory operations, but it's probably cached away in hardware), however on ARM I know that doing an unaligned 32-bit access could possibly cost 3 times as much as a single aligned 32-bit load. If the address is unaligned, I want to load the aligned option and get as much data as possible, but at all costs avoid another memory operation
Is there a way to somehow do this using magical preprocessor includes or some such, or does it just require manually defining the optimum cache size before compiling for the platform?