Setting aside the question of why someone would want to apply time dithered stimulus to a zero time model there's another solution besides using a function call to 'filter' addresses.
The classic model of asynchronous RAM usage would include a read enable which is used to reduce power by requiring the address is stable during the time read enable is true. EMI reduction goes hand in hand with power savings as well. The same can be applied to a true asynchronous write port requiring a non-clocked write to be signaled with a write enable that only occurs when address is stable. Clocked reads and writes are easy, there's a presumption the address is stable at the clock edge.
The default model in VHDL is inertial delay which models switching delay - as related by the OP where all the address 'bits' don't propagate at the same time, resulting in an out of bounds index range for a zero time memory model when the memory size isn't power of two.
Inertial delay also has a reject time, which is used to eliminate pulses shorter than the reject time. There's a requirement that the reject time be less than an associated delay, and the reject time has a default of the delay time specified in the first waveform element of a signal assignment (which can be the only element)
(See IEEE Std 1076-2008 10.5.2 Simple signal assignments.)
Signal updates are scheduled in VHDL in a projected output waveform queue, which contains values and times for a value update to take place.
We can't simply schedule a data_o update with reject time, because the queued value requires a read of the current memory contents.
We can filter out switching noise on the read_addr_i input using an inertial delay model counting on it's rejection limit.
Because you need an integer (or natural) index for the indexed memory read you could add another signal, holding the integer value of the read address and perform pulse rejection (and a delay on assignment equal to or greater than the reject time expression).
An aside here, IEEE Std 1076.6-2004 (RTL synthesis, now withdrawn) is the basis for supported synthesis constructs producing hardware. Synthesis vendors will still use this as a starting point. In 8.8.4 Signal assignment statement we can see the delay mechanism is ignored. The time expression following the after and after are also ignored (8.8.4.1).
So we can add timing to our zero time model to support pulse rejection in simulation:
architecture fum of async_memory is
type RAM is array (MEMDEPTH - 1 downto 0) of
std_logic_vector(DATAWIDTH - 1 downto 0);
signal memory: RAM;
signal addr_rd: natural;
begin
MEM_WRITE:
process( clk, rstn)
begin
if rstn = '0' then
memory <= (others => (others => '0'));
elsif rising_edge(clk) then
if wen = '0' then
memory(to_integer(unsigned(addr_wr_i))) <= data_i;
end if;
end if;
end process;
MEM_READ:
data_o <= memory(addr_rd);
PULSE_REJECT:
addr_rd <= reject 1.8 ns inertial to_integer(unsigned(addr_rd_i)) after 1.9 ns;
end architecture;
The reject limit is picked based on the model clock period (10 ns here) and we see the address takes 1.9 ns to become valid, representing a read delay (keeping in mind the assignment delay has to be equal to or larger than the reject limit). The reject limit represents the difference in switching times between any two address lines.
In a testbench we can model the OP's example:
STIMULI:
process
begin
wait for 6 ns;
rstn <= '0';
wait for 10 ns;
rstn <= '1';
wait for 20 ns;
data_i <= x"ac";
addr_wr_i <= x"8";
wen <= '0';
wait for 10 ns;
wen <= '1';
addr_rd_i <= "0111";
wait for 10 ns;
addr_rd_i(addr_rd_i'LEFT) <= '1';
wait for 1.7 ns;
addr_rd_i(addr_rd_i'LEFT -1 downto 0) <= (others => '0');
wait for 8.3 ns;
wait;
end process;
A rollover of the address from 0111 to 1000 with the MSB faster than the LSBs.
And that gives us:

Where the address value F duration was below the rejection limit.
Note an asynchronous reset was used to zero the contents of memory.