1

I've a problem with the simulation of my code. I have an asynchronous FIFO that is composed by a dual port memory. The write are performed synchronous to the writing clock, the read are performed providing the address of the location I want to read. The synchronization is performed by read and write pointer.

Basically I have something like this :

architecture rtl of async_memory is 
     type RAM is array (MEM_DEPTH - 1 downto 0) of std_logic_vector(DATAWIDTH - 1 downto 0);
     signal memory : RAM;

begin
     MEM_WRITE:process(clk,rstn)
     begin
     .....
        memory(to_integer(unsigned(addr_wr_i))) <= data_i;
     .....
     end process;

     MEM_READ:data_o <= memory(to_integer(unsigned(addr_rd_i)));

When MEM_DEPTH is a power of 2 I don't get any problem. When the MEM_DEPTH isn't a power of 2, I have some problem when I'm simulating a random delay for each wire of the addr_rd_i ( the address signal for the read ).

Just to be clear, if I have set a MEM_DEPTH of 10, the width of the addr_rd_i is 4 bits. The allowed value of addr_rd_i are :

  1. 0000
  2. 0001
  3. 0010
  4. 0011
  5. 0100
  6. 0101
  7. 0110
  8. 0111
  9. 1000
  10. 1001

Other values cause an error in the simulation of course (index constraint violation). The problem is I can have number bigger than 1001 due the delay. For example if addr_rd_i is 0111 and I want to read 1000 it's possible that for a short time I have 1111 :

  • 0111 -> 1111 -> 1000

Now the question : is there a way to avoid the simulation error ? I thought something like this :

MEM_READ:data_o <= memory(to_integer(unsigned(addr_rd_i)) mod MEM_DEPTH);

My only ( big ) concern is that I probably can't keep the same version of the file for the synthesis, so I'll need 2 files, one for the synthesis and one for simulations.

haster8558
  • 423
  • 6
  • 15

2 Answers2

2

Keep the same file for synth and sim!!!

Synthesise with and without "mod MEM_DEPTH". If they are the same size, then synthesis optimisation has removed the MOD operator ... then, no problem.

My preferred approach: write a "to_address" function performing all the type conversions, returning a valid address. Wrap a return statement involving the MOD operator between --pragma translate off and --pragma translate on (consult your synth tool for the actual accepted syntax). Follow it with a simple return statement...

Note that the read and write addresses should probably be declared as unsigned in the first place. Any time you're cascading type conversions, there's probably something wrong with the design...

function to_address(addr : unsigned) return natural is
   temp : natural := to_integer(addr);
begin
   --pragma translate off
   return temp mod MEM_DEPTH;
   --pragma translate on
   return temp;
end to_address.

Then simulation will hit the first return while synthesis will fall through to the second. Comment it out and insist on manual inspection of this function come code review time...

jotik
  • 17,044
  • 13
  • 58
  • 123
0

Setting aside the question of why someone would want to apply time dithered stimulus to a zero time model there's another solution besides using a function call to 'filter' addresses.

The classic model of asynchronous RAM usage would include a read enable which is used to reduce power by requiring the address is stable during the time read enable is true. EMI reduction goes hand in hand with power savings as well. The same can be applied to a true asynchronous write port requiring a non-clocked write to be signaled with a write enable that only occurs when address is stable. Clocked reads and writes are easy, there's a presumption the address is stable at the clock edge.

The default model in VHDL is inertial delay which models switching delay - as related by the OP where all the address 'bits' don't propagate at the same time, resulting in an out of bounds index range for a zero time memory model when the memory size isn't power of two.

Inertial delay also has a reject time, which is used to eliminate pulses shorter than the reject time. There's a requirement that the reject time be less than an associated delay, and the reject time has a default of the delay time specified in the first waveform element of a signal assignment (which can be the only element)

(See IEEE Std 1076-2008 10.5.2 Simple signal assignments.)

Signal updates are scheduled in VHDL in a projected output waveform queue, which contains values and times for a value update to take place.

We can't simply schedule a data_o update with reject time, because the queued value requires a read of the current memory contents.

We can filter out switching noise on the read_addr_i input using an inertial delay model counting on it's rejection limit.

Because you need an integer (or natural) index for the indexed memory read you could add another signal, holding the integer value of the read address and perform pulse rejection (and a delay on assignment equal to or greater than the reject time expression).

An aside here, IEEE Std 1076.6-2004 (RTL synthesis, now withdrawn) is the basis for supported synthesis constructs producing hardware. Synthesis vendors will still use this as a starting point. In 8.8.4 Signal assignment statement we can see the delay mechanism is ignored. The time expression following the after and after are also ignored (8.8.4.1).

So we can add timing to our zero time model to support pulse rejection in simulation:

architecture fum of async_memory is
    type RAM is array (MEMDEPTH - 1 downto 0) of 
            std_logic_vector(DATAWIDTH - 1 downto 0);
    signal memory:  RAM;
    signal addr_rd: natural;
begin

MEM_WRITE:
    process( clk, rstn)
    begin
        if rstn = '0' then
            memory <= (others => (others => '0'));
        elsif rising_edge(clk) then
            if wen = '0' then
                memory(to_integer(unsigned(addr_wr_i))) <= data_i;
            end if;
        end if;
    end process;
MEM_READ:
    data_o <= memory(addr_rd);
PULSE_REJECT:
    addr_rd <= reject 1.8 ns inertial to_integer(unsigned(addr_rd_i)) after 1.9 ns;
end architecture;

The reject limit is picked based on the model clock period (10 ns here) and we see the address takes 1.9 ns to become valid, representing a read delay (keeping in mind the assignment delay has to be equal to or larger than the reject limit). The reject limit represents the difference in switching times between any two address lines.

In a testbench we can model the OP's example:

STIMULI:
    process
    begin
        wait for 6 ns;
        rstn <= '0';
        wait for 10 ns;
        rstn <= '1';
        wait for 20 ns;
        data_i <= x"ac";
        addr_wr_i <= x"8";
        wen <= '0';
        wait for 10 ns;
        wen <= '1';
        addr_rd_i <= "0111";
        wait for 10 ns;
        addr_rd_i(addr_rd_i'LEFT)  <= '1';
        wait for 1.7 ns;
        addr_rd_i(addr_rd_i'LEFT -1 downto 0) <= (others => '0');
        wait for 8.3 ns;
        wait;
    end process; 

A rollover of the address from 0111 to 1000 with the MSB faster than the LSBs.

And that gives us:

async_memory_tb_fst.png

Where the address value F duration was below the rejection limit.

Note an asynchronous reset was used to zero the contents of memory.