I am using an Intel Stratix 10 FPGA and Quartus Prime Pro 21.4 to develop a power test project.
I cannot figure out how keep Quartus from optimizing away my DSP blocks.
I want to use all 3000 DSP blocks in our FPGA so that I can see the max current draw of the DSP block. Of course, we can use the power estimator, but we require a real-world physical test.
I actually don't need the output from the DSP block. I only care that they are running and using FPGA resources.
I have instantiated the Intel fixed DSP core IP as a multiplier:
I am using a generate for
loop to generate 3000 of these DSP IP blocks. My problem is that the DSP blocks are synthesized away unless I connect the output from each of the DSP blocks directly to a top level output. I only have ~1000 outputs available so this is not possible.
I thought I could just connect each output with a register array to catch the output. But it seems that if I don't actually use the output values or connect it outright to a top level output pin, then Quartus thinks we don't need it and optimizes it away.
The 2nd solution I tried is to use combinational logic:
top_output = DSP_out[0] || DSP_out[1] || DSP_out[2] || DSP_out[3]
this solution will generate 4 DSP blocks even though the generate loop runs 3000 times. I tried doing this in a loop, but it did not work. Is there a way to trick the system into synthesizing all the DSP blocks even if I don't connect the block to a top level output?
I seem to be able to access the output of the DSP block with no issues. For instance, I was able to turn on or off an LED based on the numbers I fed into a single multiplier.
Here is the full code:
`timescale 1ps/1ps
`default_nettype none
module power_test_design (
input wire clk_i,
output reg [0:0] outputa,
output reg [0:0] outputb
);
localparam NUM_DSP_BLOCKS = 3000;
genvar i;
wire reset;
integer k;
//input stimulus signals for the DSP
reg [17:0] ay_r;
reg [17:0] by_r;
reg [17:0] ax_r;
reg [17:0] bx_r;
//create wires and registers to hold outputs from multiplier
(* keep = "true" *) wire [36:0] resulta [NUM_DSP_BLOCKS-1:0];
(* keep = "true" *) reg [36:0] resulta_r [NUM_DSP_BLOCKS-1:0];
(* keep = "true" *) wire [36:0] resultb [NUM_DSP_BLOCKS-1:0];
(* keep = "true" *) reg [36:0] resultb_r [NUM_DSP_BLOCKS-1:0];
reg [2:0] ena_r;
// Stratix10 system reset
reset_release U_RESET (
.ninit_done (reset ) // output, width = 1, ninit_done.ninit_done
);
// DSP stimulus
always @(posedge clk_i) begin : DSP_SET_FF
if (reset)
begin
ay_r <= {18{1'b0}};
by_r <= {18{1'b0}};
ax_r <= {18{1'b0}};
bx_r <= {18{1'b0}};
ena_r <= {3{1'b0}};
end else
begin
ena_r <= 3'b001;
ay_r <= $unsigned(ay_r) + 1;
by_r <= $unsigned(by_r) + 1;
ax_r <= $unsigned(ax_r) + 2;
bx_r <= $unsigned(bx_r) + 3;
end
end
generate
for (i=0; i<NUM_DSP_BLOCKS; i=i+1) begin : GEN_DSPS
dsp_fixed U_DSP (
.ay (ay_r), // input, width = 18, ay.ay
.by (by_r), // input, width = 18, by.by
.ax (ax_r), // input, width = 18, ax.ax
.bx (bx_r), // input, width = 18, bx.bx
.resulta (resulta[i]), // output, width = 37, resulta.resulta
.resultb (resultb[i]), // output, width = 37, resultb.resultb
.clk0 (clk_i), // input, width = 1, clk0.clk
.clk1 (), // input, width = 1, clk1.clk
.clk2 (), // input, width = 1, clk2.clk
.ena (ena_r) // input, width = 3, ena.ena
);
//bring result to a register to assign output logic
assign resulta_r[i] = resulta[i];
assign resultb_r[i] = resultb[i];
end
endgenerate
//output logic -this code generates 6 DSP blocks....I need to generate all 3000
always @(posedge clk_i) begin : outputLogic
for (k=1; k<50; k=k+1)
begin
outputa = resulta_r[k] || resulta_r[k+1] || resulta_r[k+2];
outputb = resultb_r[k+3] || resultb_r[k+4] || resultb_r[k+5];
end
end
endmodule
`resetall
So far, I tried several ways to assign this output. first:
always @(resulta_r[0], resulta_r[1], resulta_r[2], resulta_r[3]) begin
if (resulta_r[0] == 4)
begin
outputa = 1;
end
else if (resulta_r[1] == 6)
begin
outputa = 1;
end
else if (resulta_r[2] == 6)
begin
outputa = 1;
end
else if (resulta_r[3] == 6)
begin
outputa = 1;
end
else
begin
outputa = 0;
end
end
With this code, DSP blocks are generated for each if
statement. So, the next idea was
always @(posedge clk_i) begin : outputLogic
for (k=1; k<50; k=k+1)
begin
outputa = resulta_r[k] || resulta_r[k+1] || resulta_r[k+2];
outputb = resultb_r[k+3] || resultb_r[k+4] || resultb_r[k+5];
end
end
This works in a similar way. I get a DSP block generated for each result[k]
in the combinational statement. But this only generates 6 DSP blocks in total when synthesizing. It only generates blocks based on how many DSP block outputs are in this combinational statement.