0

In a Lattice Verilog FPGA design, I have two PLL-generated clocks at the same frequency 125MHz (8ns) but the second clock is at 90° shift of the first clock:

wire clk;
wire clk90; //clk90 is clk with phase at 90°
pllm pllm_inst(.CLKI(oscInternal), .CLKOP(clk), .CLKOS(clk90));

reg [63:0] wbuf;
always @(posedge clk) begin
    wbuf <= wbuf + 1;//Fake logic
end

wire [31:0] sdram_dq_tx;
ODDRXE ODDRXE00_inst(.D0(wbuf[0]), .D1(wbuf[16]), .SCLK(clk90), .RST(1'b0), .Q(sdram_dq_tx[0]));
...

The design is very crowded and I get the following HOLD errors for all wbuf:

Error: The following path exceeds requirements by 1.585ns
 
 Logical Details:  Cell type  Pin type       Cell/ASIC name  (clock net +/-)

   Source:         FF         Q              sdram_inst/wbuf[0]  (from clkop +)
   Destination:    FF         Data in        sdram_inst/ODDRXE00_inst  (to clkop2 +)

   Delay:               0.380ns  (34.5% logic, 65.5% route), 1 logic levels.

 Constraint Details:

      0.380ns physical path delay sdram_inst/SLICE_1029 to ddr_Dq[0]_MGIOL exceeds
     -0.011ns DO_HLD and
      0.000ns delay constraint less
     -1.976ns skew less
      0.000ns feedback compensation requirement (totaling 1.965ns) by 1.585ns

How could I constraint this path between the two clocks at 90° of each other in order to close the timing of my design? Would it make sense to force on wbuf a hold of 2ns (90° of 8ns) and how can I achieve that with a timing constraint?

gregoiregentil
  • 1,793
  • 1
  • 26
  • 56

1 Answers1

1

Some ideas, though I will not claim it as answers, but posting as answer gives better structure than posting as comment ;-)

Having only 2 ns between the two rising edges is probably to short to allow for timing closure, also since the wbuf Flip-Flops (FFs) are in main logic, where the DDR ODDRXE is at the edge for IO.

Depending on what you can do in the actual design, there are several possibilities:

  • Recapture the wbuf data on the falling edge of clk, which gives 4 ns for a direct Flip-Flop (FF) transfer, and then use the recaptured value for clk90, whereby you get 6 ns (270 deg)
  • Recapture the wbuf data in FFs on the clk90, before feeding the recaptured value to the ODDRXE at clk90, though it only gives 2 ns for recapture, the recapture is done in main logic, thus not between main logic and edge DDR
  • Change the clock clk90 to a clock clk270, and then realign the data for the DDR ODDRXE to match accordingly, which will give 6 ns for until capture of the DDR input data the the rising edge to the DDR ODDRXE
  • Move the wbuf logic to the clk90, thus getting the clock domain crossing between clk and clk90 inside the main logic, while using wbuf directly for the DDR ODDRXE
Morten Zilmer
  • 15,586
  • 3
  • 30
  • 49
  • Thanks for helping. What is "strange" is that the design was working when it was not very crowded. More precisely, with a half-filled design, I see "2.045ns physical path delay" instead of 0.38ns ; so the timing is met. Having a more complex design "removes" logic space and Lattice doesn't seem to be able to do large physical path delays any more. I'm confused but it's what I see. My question is: can I tell the software to route this part first with as much space as needed to get 2ns and then optimize the rest of the design with the remaining space. Otherwise, I need to modify the wbuf logic. – gregoiregentil Feb 26 '22 at 09:49
  • Fix timing problems in FPGA by doing manual placement of logic elements, is generally cumbersome, and hard to maintain, so should be a last resort. Since timing could close when the design was smaller, the 2 ns phase diff is probably enough in a best case design, though not in the contested design. Maybe moving the wbuf logic to the clk90, thus getting the clock domain crossing between clk and clk90 inside the main logic is an option. I have added that to the list above. – Morten Zilmer Feb 26 '22 at 10:06
  • I have tried various suggested ideas and the problem is just translated. If wbuf is in clk90, then the PAR software still needs to hold some registers for 2ns. Unless I use a cross-domain FIFO, the only solution is to have routing delay up to 2ns, which requires space. There is an option of floor planning which you can force and reserve physical space for a group of registers. This is not simple: for starter, how to calculate how much routing square space is needed for 2ns??? – gregoiregentil Feb 28 '22 at 05:13
  • If the issue can be solved by a Clock-Domain-Crossing (CDC) FIFO, it will probably save you time in future maintenance, and have predictable synthesis time, instead of implementing some placement that is likely to violate timing is some synthesis runs, and thus will need time consuming tweaking. – Morten Zilmer Feb 28 '22 at 14:59