Verilog strange simulation results post synthesis

Question

I am facing a strange problem. The code is for a simple ALU. Only code of interest is pasted here:

   always @(posedge clk or posedge rst)
   begin
        if (rst == 1) begin
           mul_valid_shr = 3'b000; 
        end else begin
            if (op_mul_i == 1) begin
                mul_valid_shr = 3'b111;
            end else begin
                mul_valid_shr <= mul_valid_shr << 1;
            end
        end
   end

And outside the always block:

assign mul_valid = mul_valid_shr[2];

The POST SYNTHESIS FUNCTIONAL SIMULATION with my test bench has following results:

The reset is already low, why is the sim not working for the first time but working fine for 2nd and third time? If I trigger the op_mul_i before 100ns mark, even if rst is low, even the mul_result stops working on the first time.

Any guesses are welcome.

UPDATE: FULL CODE HERE: https://www.edaplayground.com/x/28Hx

I suspect this is not the cause of your problem, but never mix up blocking and non-blocking assignments _to the same variable_ in a clocked always block. Usually, you want to use non-blocking assignments to any variable that implies a flip-flop. I am surprised your synthesiser allowed you to do this. So, the two blocking assignments to `mul_valid_shr` (`mul_valid_shr = 3'b...`) should be non-blocking (`mul_valid_shr <= 3'b...`). — Matthew Taylor, Jun 30 '17 at 13:03
I already tried this and used non-blocking for everything. It still has the exact same results. — Qazi, Jun 30 '17 at 13:10
I didn't think it would help. But nevertheless don't mix them up. — Matthew Taylor, Jun 30 '17 at 13:14
does it work in the `pre`-synthesis simulation? I also think that you missed some important rtl parts in your example. — Serge, Jun 30 '17 at 13:17
@Serge Yes I did not put in all the code so that the focus is not shifted. I can post more code if needed. It works perfectly in post synthesis simulation. — Qazi, Jun 30 '17 at 13:28
Could you post your all your RTL on EDA playground so we can see the bigger picture? — Krouitch, Jun 30 '17 at 14:03
@Krouitch Here it is: https://www.edaplayground.com/x/28Hx thanks! — Qazi, Jun 30 '17 at 14:22
I cannot see anything obviously wrong with you `mul_valid` calculus. During synthesis, don't you have warnings that could indicate what could have gone wrong? Apart from that, you still have blocking assignment in your multiplication process(l 32->47), you should make them non blocking. Even if it does not solve your problem, it cannot hurt. I also would separate `mul_results` and `mul_ops` regs in two separate processes. — Krouitch, Jun 30 '17 at 14:48
@Krouitch Yes the non-blocking/blocking thing is me just playing around with it to see changes. You are right in saying that it makes no difference to results. Is there any specific reason you recommend breaking the multiplication process in two separate always blocks? — Qazi, Jun 30 '17 at 14:54
@Krouitch I forgot to address your comments on synthesis warnings. I used to get warnings but since I have been stuck on this for a full day now, I changed the code to a point where there are absolutely no warnings. — Qazi, Jun 30 '17 at 14:56
Mainly coding style consideration. I guess the synthesis tool understands it correctly. However, thinking of the hardware, it is kind of odd to consider a flip flop which will manage data not evaluated at the same time. Nevertheless I do not see how it can modify the behavior of mul_valid. Do you use any delay modelisation (.sdf file for example?) — Krouitch, Jun 30 '17 at 15:02
I am unable to reproduce the issue you mentioned. I have created the post synthesis design file using `Yosys` on EDA Playground and see that the simulation result match exactly. You can see the code [here](https://www.edaplayground.com/x/db_) — Rahul Behl, Jun 30 '17 at 15:46
I'm guessing it is a race condition in your testbench between the clock and input stimulus. Try changing the first `#40` in your initial block to `#39` or `#41`. If that works I will give a better explanation why in a proper an answer. — Greg, Jun 30 '17 at 20:13
@RahulBehl thank you for that info. Now I am thinking this is related more to simulation or tb rather than the actual design. — Qazi, Jul 03 '17 at 07:02

patstew · Answer 1 · 2018-03-27T15:14:16.413

4

The Xilinx simulator simulates the FPGA global reset for the first 100ns of any post-synthesis simulation, so you basically have to hold your logic in reset and clock for at least 100ns to get sensible results. This is mentioned in UG900 on pg 13.

edited Mar 27 '18 at 15:14

answered Mar 04 '18 at 19:11

patstew

1,806
17
21

Greg · Answer 2 · 2017-12-18T19:20:50.887

Verilog has has the concepts of nondeterminism and race condtions. Below are exert from various version of Verilog and SystemVerilog explaining the concepts:

IEEE Std 1364-1995 § 5.4.2 Nondeterminism
IEEE Std 1364-2001 § 5.4.2 Nondeterminism
IEEE Std 1800-2012 § 4.8 Nondeterminism

One source of nondeterminism is the fact that active events can be taken off the queue and processed in any order. Another source of nondeterminism is that statements without time-control constructs in behavioral blocks do not have to be executed as one event. Time control statements are the # expression and @ expression constructs (see 9.7 [9.4 for IEEE1800]). At any time while evaluating a behavioral statement, the simulator may suspend execution and place the partially completed event as a pending active event on the event queue. The effect of this is to allow the interleaving of process execution. Note that the order of interleaved execution is nondeterministic and not under control of the user.

IEEE Std 1364-1995 § 5.5 Race conditions
IEEE Std 1364-2001 § 5.5 Race conditions
IEEE Std 1800-2012 § 4.8 Race conditions

Because the execution of expression evaluation and net update events may be intermingled, race conditions are possible:
assign p = q;
initial begin
  q = 1;
  #1 q = 0;
  $display(p);
end
The simulator is correct in displaying either a 1 or a 0. The assignment of 0 to q enables an update event for p. The simulator may either continue and execute the $display task or execute the update for p, followed by the $display task.

In short this means an always block that triggers on clk can be evaluated before or after op_mul_i is updated even though clk and op_mul_i are changed in the same time-step. This nondeterministic and race condition behaviors are intentional; allowing the language a way to mimic the same behavior that can happen with critical paths on FPGA and silicon.

Regardless the solution and best practice is to have an offset (time or scheduler region) between the clock and input stimulus. You can use a time offset such at the ± 1 on the first # delay; like I suggest in my comment. Or assign the input stimulus with non-blocking assignments (<=); which will always be updated after the clock and anything dependent on the clock. (This is why flops should be assigned with non-blocking). Which route you take is up to you or your team lead to decide.

Thanks for the detailed answer. Unfortunately it does not solve the complete problem. As I wrote in comments, the problem with reset still persists. If I give a long enough active reset (~100ns) all functionality is there, but for a smaller reset anything happening within the 100ns is ignored. The design is running at 250 Mhz and setup/hold times are less than one tenth of a ns. Do you have any ideas on this? — Qazi, Jul 05 '17 at 16:32

score 0 · Answer 3 · answered Jun 30 '17 at 17:58

How is op_mul_i generated? Is it synchronous to clk? I ask because in the second part of your simulation, I see mul_valid being driven to logic-1 when op_mul_i is logic-1. If it was synchronous, I would expect mul_valid to be logic-1 at the clock edge next to the 200ns edge. As this is post synthesis, I suspect metastability causing this issue. At 100ns, op_mul_i is changing within the failure window, and the clock edge does not detect op_mul_i as logic-1, and hence you don't see anything.

Synchronize op_mul_i to clk, and use the synchronized signal to drive mul_valid_shr. Also, don't use blocking statements in a sequential block.

Hope that helps. VK

Thanks for the answer, op_mul_i is generated synchronous to clock at n*period times. mul_valid is supposed to go high on the third cycle after op_mul_i turns high. I have stopped mixing blocking and nonblocking statements. — Qazi, Jul 03 '17 at 07:07

score 0 · Answer 4 · answered Jun 30 '17 at 18:57

you created an asynchronous flop with op_mul_i as an asynchronous signal. It is modified in your initial block and this modification is not synchronized with clk. So, it looks like a race to me. And the hardware is correct ignoring some steps.

So, your simulation results were probably correct due to a simulation artifact. I guess that the right rtl approach would be to sync the signal with the clock by providing yet another flop for this signal.

Other than that you can try to play with nonblocking assignments or #0 delays in your initial block in simulation for this signal.

Thank you for the answer. How do you mean "asynchronous flop with op_mul_i "? op_mul_i is an input signal that is triggered only at n*period i.e. 20,40,60 etc. doesn't this make this synchronous i.e. changing with clock? In the initial tb block I am now using "#0" but it makes no diff on the results. — Qazi, Jul 03 '17 at 07:04

Verilog strange simulation results post synthesis

4 Answers4