indexing memory for UART transmission using > 100% SLICEs Tang Nano

Question

I am trying to build a simple UART reception parser command line based on example from Tang Nano 9K repo here, here is my modified version. It basically uses a memory to hold some values, which is working. Once I receive 5 characters I would like to send them back to host, basically looping through the memory items and sending once tx_data_rdy is set.

module uart_test(
    input                        clk,
    input                        rst_n,
    input                        uart_rx,
    output                       uart_tx,
    output reg [5:0]             led
);

parameter                        CLK_FRE  = 27;         //MHz
parameter                        UART_FRE = 115200;     //bps
localparam                       IDLE =  0;
localparam                       SEND =  1;   
localparam                       WAIT =  2;   
localparam                       PROCESS_RX =  3;


parameter                        MSG_SIZE = 256;    

reg[7:0]                         tx_data;
reg[7:0]                         tx_str;
reg                              tx_data_valid;
wire                             tx_data_ready;
reg[7:0]                         tx_cnt;
wire[7:0]                        rx_data;
wire                             rx_data_valid;
wire                             rx_data_ready;
reg[31:0]                        wait_cnt;
reg[3:0]                         state;

reg [7:0]                        message [MSG_SIZE - 1:0];
reg [7:0]                        rx_index;   
integer                          i;        
localparam                      DATA_NUM2   = 3;    // just for testing

assign rx_data_ready = 1'b1;//always can receive data,

always@(posedge clk or negedge rst_n)
begin
    if(rst_n == 1'b0)
    begin
        led <= ~6'b100000;
        wait_cnt <= 32'd0;
        tx_data <= 8'd0;
        state <= IDLE;
        tx_cnt <= 8'd0;
        tx_data_valid <= 1'b0;
        rx_index <= 8'd0;
        for (i = 0; i < 256; i = i+1) begin
            message[i] <= 0;
        end
    end
    else
    case(state)
        IDLE:
        begin
            if(rx_data_valid == 1'b1)
            begin
                message[rx_index] <= rx_data;   // send uart received data
                if(rx_index >= 8'd4)
                begin
                    rx_index <= 8'd0;
                    tx_cnt <= 8'd0;
                    state <= PROCESS_RX;
                end
                else begin
                    led <= ~rx_data[5:0];
                    rx_index <= rx_index + 8'd1;
                end
            end
        end
        PROCESS_RX:
        begin
            tx_data <= tx_str;
            tx_data_valid <= 1'b1;
            state <= WAIT;
        end
        WAIT:
        begin
            if(tx_data_valid == 1'b1 && tx_data_ready == 1'b1) begin
                tx_data_valid <= 1'b0;
                tx_cnt <= tx_cnt + 8'd1; //Send data counter
                if(tx_cnt <= (DATA_NUM2 - 1)) begin
                    state <= PROCESS_RX;
                    led <= ~6'b000110;
                end
                else begin
                    led <= ~6'b000100;
                    state <= IDLE;
                end
            end
        end
        default:
            state <= IDLE;
    endcase
end


always@(tx_cnt)
    tx_str <= message[tx_cnt];

uart_rx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_rx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .rx_data                    (rx_data                  ),
    .rx_data_valid              (rx_data_valid            ),
    .rx_data_ready              (rx_data_ready            ),
    .rx_pin                     (uart_rx                  )
);

uart_tx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_tx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .tx_data                    (tx_data                  ),
    .tx_data_valid              (tx_data_valid            ),
    .tx_data_ready              (tx_data_ready            ),
    .tx_pin                     (uart_tx                  )
);
endmodule

The Verilog is not able to synthetize as it runs out of SLICEs.

If I just comment the line that increments the index variable used for indexing the memory, i.e. tx_cnt <= tx_cnt + 8'd1; , then it builds ok.

I have tried different things without success, any Verilog expert probably can quickly see the problem but I am not getting it.

What am I doing wrong?

Mikef · Accepted Answer · 2023-07-10T13:08:28.363

The problems is that the synthesis tool is trying to make a 8-bit 256 location memory out of logic fabric/slices/registers. The physical FPGA does not have the resources to implement the design.

Here are two solutions:

Reduce the memory size. 256 elements seems like a lot when the need is 5.
Changing the size to 16 reduces the utilization burden, and provides margin based on the need of 5.

Change the parameter to parameter MSG_SIZE = 16;

and change the for loop indexing to for (i = 0; i < 16; i = i+1) begin
Code using a style which will infer a BRAM primitive for the memory (rather than inferring logic fabric/slices/registers). FPGA BRAM's do not have reset pins (reset inputs) on them therefore its not possible to reset/initialized them using a reset signal in RTL. Most applications don't need a reset. There does not seem to be a reason this application needs a reset. Make sure the design writes at least one value to the memory before reading one.

If the tools detect a coding style attempting reset (like the posted code), then they will make a memory using logic fabric/slices/registers for the user. Removing code which attempts to reset the memory will allow the synthesis tool to infer a BRAM and use 0 slices for the memory. For this solution, keep the memory size at 256 because if the memory is small beyond some threshold, then the tool may decide not to waste a BRAM on what it analyzes to be a very small memory.

Another coding style issue is that the read side of the memory was modeled as a transparent latch, and BRAMS don't have them so that also prevents BRAM inference. Latches are generally not recommended for RTL design. Here is a small re-write to create a separate synchronous process for the memory to facilitate the tools to inference of BRAM.

Please run the simulations to make sure it behaves as desired. It may need a read enable if the data does not come out of the memory at the right time.


module uart_test(
    input                        clk,
    input                        rst_n,
    input                        uart_rx,
    output                       uart_tx,
    output reg [5:0]             led
);

parameter                        CLK_FRE  = 27;         //MHz
parameter                        UART_FRE = 115200;     //bps
localparam                       IDLE =  0;
localparam                       SEND =  1;   
localparam                       WAIT =  2;   
localparam                       PROCESS_RX =  3;


parameter                        MSG_SIZE = 256;    

reg[7:0]                         tx_data;
reg[7:0]                         tx_str;
reg                              tx_data_valid;
wire                             tx_data_ready;
reg[7:0]                         tx_cnt;
wire[7:0]                        rx_data;
wire                             rx_data_valid;
wire                             rx_data_ready;
reg[31:0]                        wait_cnt;
reg[3:0]                         state;

reg                              message_wr_en;
reg [7:0]                        message [MSG_SIZE - 1:0];
reg [7:0]                        rx_index;   
integer                          i;

localparam                      DATA_NUM2   = 3;    // just for testing

assign rx_data_ready = 1'b1;//always can receive data,

always@(posedge clk or negedge rst_n)
begin
    if(rst_n == 1'b0)
    begin
        led <= ~6'b100000;
        wait_cnt <= 32'd0;
        tx_data <= 8'd0;
        state <= IDLE;
        tx_cnt <= 8'd0;
        tx_data_valid <= 1'b0;
        rx_index <= 8'd0;
        message_wr_en <= 1'b0;
        //for (i = 0; i < 256; i = i+1) begin
        //    message[i] <= 0;
        //end
    end
    else
    case(state)
        IDLE:
        begin
            if(rx_data_valid == 1'b1)
            begin
                message_wr_en <= 1'b1;
                // message[rx_index] <= rx_data;   // send uart received data
                if(rx_index >= 8'd4)
                begin
                    rx_index <= 8'd0;
                    tx_cnt <= 8'd0;
                    state <= PROCESS_RX;
                end
                else begin
                    led <= ~rx_data[5:0];
                    rx_index <= rx_index + 8'd1;
                end
            end
            else
              message_wr_en <= 1'b0;
          
        end
        PROCESS_RX:
        begin
            tx_data <= tx_str;
            tx_data_valid <= 1'b1;
            state <= WAIT;
        end
        WAIT:
        begin
            if(tx_data_valid == 1'b1 && tx_data_ready == 1'b1) begin
                tx_data_valid <= 1'b0;
                tx_cnt <= tx_cnt + 8'd1; //Send data counter
                if(tx_cnt <= (DATA_NUM2 - 1)) begin
                    state <= PROCESS_RX;
                    led <= ~6'b000110;
                end
                else begin
                    led <= ~6'b000100;
                    state <= IDLE;
                end
            end
        end
        default:
            state <= IDLE;
    endcase
end

  // **************************************************************
  // Model BRAM
  // **************************************************************
  // Changed this process from a transparent latch to clocked registers
  // latches are not good 99.9999% of the time.
  always@(posedge clk) begin
    if(message_wr_en)
      message[rx_index] <= rx_data;   // send uart received data
    
    // This memory output might need an enable from the state machine
    // if the data comes out at the wrong time.
    tx_str <= message[tx_cnt];
  end
  
uart_rx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_rx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .rx_data                    (rx_data                  ),
    .rx_data_valid              (rx_data_valid            ),
    .rx_data_ready              (rx_data_ready            ),
    .rx_pin                     (uart_rx                  )
);

uart_tx#
(
    .CLK_FRE(CLK_FRE),
    .BAUD_RATE(UART_FRE)
) uart_tx_inst
(
    .clk                        (clk                      ),
    .rst_n                      (rst_n                    ),
    .tx_data                    (tx_data                  ),
    .tx_data_valid              (tx_data_valid            ),
    .tx_data_ready              (tx_data_ready            ),
    .tx_pin                     (uart_tx                  )
);
endmodule

Thanks, it works for me to reduce the size and explain the error very well. Only thing is that by removing the init loop on memory and leaving the memory size to 256 still uses the fabric slices registers, so no BRAM is used. — mhanuel, Jul 09 '23 at 05:51
@mhanuel Updated with a more complete answer for the BRAM inference case. — Mikef, Jul 10 '23 at 13:09
your modifications works very well for the BRAM issue. I just noticed that the way you have modified the implementation does not correctly increment the index for memory. I know this is not the topic of original question but just wondering if you have some solution for that on this basic example. Thanks a lot. — mhanuel, Jul 12 '23 at 16:03
To debug the behavior of individual variables one would need the uarx_rx and uart_tx models and a small testbench with the uart_test module as the DUTfor debugging. If you have these post them so that others can run your simulation, in a new question and refer to this question in a link so that others are able to see the history. Its difficult to debug RTL code without a testbench. — Mikef, Jul 13 '23 at 00:09

indexing memory for UART transmission using > 100% SLICEs Tang Nano

1 Answers1