I've been having this debate for years... What's the correct why to infer a single port ram with synchronous read.
Let's Suppose the interface for my inferred memory in VHDL is:
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity sram1 is
generic(
aw :integer := 8; --address width of memory
dw :integer := 8 --data width of memory
);
port(
--arm clock
aclk :in std_logic;
aclear :in std_logic;
waddr :in std_logic_vector(aw-1 downto 0);
wdata :in std_logic_vector(dw-1 downto 0);
wen :in std_logic;
raddr :in std_logic_vector(aw-1 downto 0);
rdata :out std_logic_vector(dw-1 downto 0)
);
end entity;
is this this way: Door #1
-- I LIKE THIS ONE
architecture rtl of sram1 is
constant mem_len :integer := 2**aw;
type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);
signal block_ram : mem_type := (others => (others => '0'));
begin
process(aclk)
begin
if (rising_edge(aclk)) then
if (wen = '1') then
block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
end if;
-- QUESTION: REGISTERING THE READ DATA (ALL OUTPUT REGISTERED)?
rdata <= block_ram(to_integer(unsigned(raddr)));
end if;
end process;
end architecture;
Or this way: Door #2
-- TEXTBOOKS LIKE THIS ONE
architecture rtl of sram1 is
constant mem_len :integer := 2**aw;
type mem_type is array (0 to mem_len-1) of std_logic_vector(dw-1 downto 0);
signal block_ram : mem_type := (others => (others => '0'));
signal raddr_dff : std_logic_vector(aw-1 downto 0);
begin
process(aclk)
begin
if (rising_edge(aclk)) then
if (wen = '1') then
block_ram(to_integer(unsigned(waddr))) <= wdata(dw-1 downto 0);
end if;
-- QUESTION: REGISTERING THE READ ADDRESS?
raddr_dff <= raddr;
end if;
end process;
-- QUESTION: HOT ADDRESS SELECTION OF DATA
rdata <= block_ram(to_integer(unsigned(raddr_dff)));
end architecture;
I'm a fan of the first version because I think its good practice to register all of the output of your vhdl module. However, many textbook list the later version as the correct way to infer a single port ram with synchronous read.
Does it really matter from a Xilinx or Altera synthesis point of view, as long as you already have taken into account the different between delaying the data verses the address (and determined it doesn't matter for your application.)
I mean...they both still give you block rams in the FPGA? right?
or does one give you LUTS and the other Block rams?
Which would infer a better timing and better capacity in an FPGA, door #1 or door #2?