3

I've been having this debate for years... What's the correct why to infer a single port ram with synchronous read.

Let's Suppose the interface for my inferred memory in VHDL is:

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

entity sram1 is
    generic(
        aw             :integer := 8; --address width of memory
        dw             :integer := 8  --data width of memory
    );
    port(
        --arm clock
        aclk   :in    std_logic;
        aclear :in    std_logic;

        waddr  :in    std_logic_vector(aw-1 downto 0);
        wdata  :in    std_logic_vector(dw-1 downto 0);
        wen    :in    std_logic;

        raddr  :in    std_logic_vector(aw-1 downto 0);
        rdata  :out   std_logic_vector(dw-1 downto 0)        
    );
end entity;

is this this way: Door #1

-- I LIKE THIS ONE
architecture rtl of sram1 is    
    constant mem_len :integer := 2**aw;

    type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);

    signal block_ram : mem_type := (others => (others => '0'));

begin

process(aclk)
begin
    if (rising_edge(aclk)) then
        if (wen = '1') then
            block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
        end if;     

        -- QUESTION: REGISTERING THE READ DATA (ALL OUTPUT REGISTERED)?
        rdata <= block_ram(to_integer(unsigned(raddr)));        

    end if;
end process;


end architecture;

Or this way: Door #2

-- TEXTBOOKS LIKE THIS ONE
architecture rtl of sram1 is    
    constant mem_len :integer := 2**aw;

    type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);

    signal block_ram : mem_type := (others => (others => '0'));
    signal raddr_dff : std_logic_vector(aw-1 downto 0);        

begin

process(aclk)
begin
    if (rising_edge(aclk)) then
        if (wen = '1') then
            block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
        end if;     

        -- QUESTION: REGISTERING THE READ ADDRESS?
        raddr_dff <= raddr;        

    end if;
end process;

-- QUESTION: HOT ADDRESS SELECTION OF DATA
rdata <= block_ram(to_integer(unsigned(raddr_dff)));        

end architecture;

I'm a fan of the first version because I think its good practice to register all of the output of your vhdl module. However, many textbook list the later version as the correct way to infer a single port ram with synchronous read.

Does it really matter from a Xilinx or Altera synthesis point of view, as long as you already have taken into account the different between delaying the data verses the address (and determined it doesn't matter for your application.)

I mean...they both still give you block rams in the FPGA? right?

or does one give you LUTS and the other Block rams?

Which would infer a better timing and better capacity in an FPGA, door #1 or door #2?

pico
  • 1,660
  • 4
  • 22
  • 52

3 Answers3

2

The differences can matter, and it really depends on the specific family you are targeting. Most modern FPGAs have options for the block ram that allow them to function either way, but I wouldn't count on that in practice.

If I am inferring RAM, I typically start with the example design provided with the tools (there's almost always a "how to infer ram" section of the user guide). If targeting cross-platform (eg: Altera + Xilinx) I'd stick with a "minimal common supported" set of features, merging the two example designs.

All that said, I typically register BOTH the address and the data. It's one more clock, but it helps close timings and I'm usually more concerned with throughput vs. overall latency. I also typically use wrapper functions (eg: My_Simple_Dual_Port_RAM) and directly instantiate the low-level block rams using primitives which makes it easy to switch between FPGA vendors (or swap out the inferred logic if/when needed). I just drop the modules in a directory (eg: Altera, Lattice, Xilinx) and include the appropriate directory in the project file. I also do the same thing with dual clock FIFOs, where you're typically a LOT better off using the library parts vs. trying to build your own.

Charles Steinkuehler
  • 3,335
  • 18
  • 12
2

Unfortunately, the synthesis tool vendors have made the RAM inference functions so that they typically recognize both styles, regardless of the physical implementation of the RAM in the FPGA in question. So even if you specify registered output, the syntesis tool may silently ignore that and infer a RAM with registered inputs instead. This is not functionally equivalent, so it may actually lead to undesired behaviour, particularly in the case of dual port RAMs.

To avoid this pitfall, you can add vendor specific attributes telling the syntehsis tool exactly which kind of RAM you need.

In general, most FPGAs have mandatory registered inputs on the physical RAM, and can add a additional optional register on the output. So using the code style code with registered inputs will probably make simulation match reality, which is typically a good thing.

Timmy Brolin
  • 1,101
  • 1
  • 8
  • 18
  • If synthesis tools universally requires "Registering inputs" to an inferred RAM, then that would explain why the book examples prefer to register the read address rather than the read data... – pico Aug 12 '19 at 15:16
  • part of my confusion is the vivado warning message: "SYNTH #1 Warning The timing for the instance xxx/xx/x , implemented as a block RAM, might be sub-optimal as no output register was merged into the block "... it turns out this warning message is wrong... the warning only goes away if you register the "read address" which is an input to the inferred SRAM... – pico Aug 12 '19 at 15:28
  • You asked for a registered output RAM, and the syntehsis silently inferred a registered input RAM instead, which may cause sub-optimal timing on the outputs. Which is the warning you got. So the warning is not really wrong. – Timmy Brolin Aug 12 '19 at 15:43
  • 1
    In general, if you want a RAM with registered outputs, then you have to infer a RAM with both registered inputs and outputs. Or constuct the RAM from distributed logic. Or use one of the few FPGAs which has RAMs with that capability. – Timmy Brolin Aug 12 '19 at 15:59
  • yeah...it figures...vivado is notorious for indirect warning message that leave you stratching your head.. In contrast, previous version of the Xilinx compiler, ISE, didn't warn you at all if you used block rams with asynchronous reads in your fpga design... since vivado they started calling it a "synthesis methology warning"... for what its worth.. – pico Aug 12 '19 at 16:44
1

You can take a look at the results of the synthesis. My Vivado gives me the following reports after synthesizing your solutions (default settings).

First solution:

  • BRAM: 0.5 (from 60 Blocks)
  • IO: 34
  • BUFG: 1

And the schematic looks like this

enter image description here

Second solution:

  • BRAM: 0.5 (from 60 Blocks)
  • IO: 34
  • BUFG: 1

With the following result:

enter image description here

So you see that the synthesis will generate the same output for both variants. It is up to you which one you want to use. I prefer the first variant because the second is slightly more code.

Kampi
  • 1,798
  • 18
  • 26
  • It's worked in this case, but for future readers, you can't always assume that because two schematics are the same, the function is the same. The memory block in this example might have the internal read data registers enabled, and this would not be apparent in the post-synthesis schematic, without looking at the properties of the memory block. – scary_jeff Aug 12 '19 at 09:32
  • that's an interesting example, Its kind of scary that vivado will infer the same logic for a RAM when they are functionally different... maybe they also set some parameters depending on which type... – pico Aug 12 '19 at 15:18
  • pico: No, they don't. The two are functionally identical. And one of them deviates from what you specified in VHDL. Which is scary indeed. – Timmy Brolin Jan 21 '21 at 08:33