Adding large numbers in FPGA in one clock cycle

Question

If I have a VHDL adder which adds two numbers together:

entity adder is
    port(
        clk : in std_logic;
        sync_rst : in std_logic;
        signal_A_in : in signed(31 downto 0);
        signal_B_in : in signed(31 downto 0);
        result_out : out signed(31 downto 0)
    );
end adder;

I have two options, one is to concurrently sum signal_A_in and signal_B_in together as so:

architecture rtl of adder is

begin

result_out <= signal_A_in + signal_B_in;

end rtl;

The other is to perform the addition in a clocked process as so:

architecture rtl of adder is

begin

myproc1 : process(clk, sync_rst)
begin
    if clk = '1' and clk'event then
        if sync_rst='1' then
            result_out <= (others=>'0');
        else
            result_out <= signal_A_in + signal_B_in;
        end if;
    end if;
end process;

end rtl;

So option B will have a single clock cycle delay compared to option A. However does it guarantee that the result will be ready in one clock cycle (i.e. to meet timing). The reason I am asking this is because I am getting a timing failure on my design which utilises option A; concurrent summation. I believe that such a methodology is OK for smaller size numbers because the combinatorial logic delay is lower but when the numbers start getting larger the delay increases and the design fails timing. How does the synthesis tool cope with this and does putting the expression in a clocked process solve the issue?

score 2 · Accepted Answer · answered Aug 05 '22 at 14:57

When you write something like signal_A_in + signal_B_in; that is combinatorial logic for an adder. Each FPGA will have different amount of time it takes for signals propagate through wires to+from the adder, and the adder itself.

When you do something like

if clk = '1' and clk'event then
    result_out <= signal_A_in + signal_B_in;

As you noted you are now creating a 1 cycle delay by inferring a register. So now, no matter what your path ends right after your adder sending the result into a register called result_out. Which is why your timing improved. Ex. as shown the path is likely just for your adder - giving you plenty of time and you pass timing. (but be careful adding a register != guaranteed to meet timing).

Timing is worse in your first example and fails because it does not infer a register. Now not only does your signal need to get across the signal_A_in + signal_B_in adder logic in the clock cycle time - BUT ALSO needs to get across whatever result_out is driving (maybe more adders, other logic somewhere else etc). Your timing path is AT LEAST as long you adder - and probably longer since you didnt break up the path with a register.

Often times even larger adders are done not in 0 cycles (comb. logic) or 1 cycle(with a register output) but over N cycles as a pipelined operation.

This is mostly for you the human to fix - but some synthesis tools can do small retiming of circuits to help.

Adding large numbers in FPGA in one clock cycle

1 Answers1