If I have a VHDL adder which adds two numbers together:
entity adder is
port(
clk : in std_logic;
sync_rst : in std_logic;
signal_A_in : in signed(31 downto 0);
signal_B_in : in signed(31 downto 0);
result_out : out signed(31 downto 0)
);
end adder;
I have two options, one is to concurrently sum signal_A_in and signal_B_in together as so:
architecture rtl of adder is
begin
result_out <= signal_A_in + signal_B_in;
end rtl;
The other is to perform the addition in a clocked process as so:
architecture rtl of adder is
begin
myproc1 : process(clk, sync_rst)
begin
if clk = '1' and clk'event then
if sync_rst='1' then
result_out <= (others=>'0');
else
result_out <= signal_A_in + signal_B_in;
end if;
end if;
end process;
end rtl;
So option B will have a single clock cycle delay compared to option A. However does it guarantee that the result will be ready in one clock cycle (i.e. to meet timing). The reason I am asking this is because I am getting a timing failure on my design which utilises option A; concurrent summation. I believe that such a methodology is OK for smaller size numbers because the combinatorial logic delay is lower but when the numbers start getting larger the delay increases and the design fails timing. How does the synthesis tool cope with this and does putting the expression in a clocked process solve the issue?