FPGA Synchronous Bus Timing

Question

I find myself implementing a Verilog code for interfacing a FT600 USB3.0 FIFO to a Lattice ICE40 FPGA. The question I will ask here is not specific to this parts though, as it applies to whenever you have to design a state machine and read/write data to a synchronous parallel bus.

I'm sure it is very basic stuff, but I just can not find a satisfying answer anywhere in the Internet, and I can not think of another way to formulate the problem. Here it goes.

Here is the timing diagram of the bus in question. (taken from ft600 data sheet, omitting redundant parts):

Observing the diagram, we see that the data and control signals provided by the FT600 are stable during the rising clock edges. Therefore, the FSM must sample those signals and change state accordingly on the rising edges of the clock (always @(posedge clk)). Is this reasoning correct?
I am implementing a Moore FSM where the outputs depend only on the current state. Let's say the initial state is RX_WAIT. As soon as the FSM samples the RXF_N=0 line at rising clock (A), the state will change to RX_PRE. Then, a combinational block translates the state RX_PRE into the FPGA outputs OE_N=0, RD_N=0. The problem is: If this combinational block is very fast, the outputs will change at the red line just after (A), not in the black line between rising clocks as it should be. This could violate the hold condition of the chip. I can think of two solutions for this:

A) Putting a register that samples the output after the combinational block at the falling edge of the clock. Then, we will have problems if the combinational block is slower than half a clock cycle. Also, I've been told that is not good to mix rising and falling edge flip flops unless you are doing DDR.

B) Ensuring somehow that the delay of the combinational block is exactly half a clock cycle, adding delay if necessary (Is this what we want? Make the system slower?). In this case, how can I instruct the compiler to do that? I'm using Ice Cube 2 which supports timing constraints similar to the Altera's, but I've never used them and I am not familiar with the terms (Output delay, Input delay, Max delay, Multicycle, Clock Latency...) nor how to use them.

I'm pretty sure (B) is the way to go, if any experienced user must provide me some advice I would be really thankful.

score 0 · Answer 1 · answered May 25 '16 at 05:54

If you are trying to provide hold time relative to a clock for a bus interface, there are a number of ways to do it. I can't speak to the Lattice part or tools directly, I haven't worked with their devices.

Constraint driven fixed delay

Provide the design tool with timing constraints that will infer the proper hold times. Internally, it will likely use a programmable delay element which is usually located in or near the I/O Block structure. I personally don't like this one because of the variation on a compile basis. Tools can't give you a precise delay, they only guarantee "no less than" or "no more than". So you might wind up fixing a bug in an unrelated area and then having your bus become unstable with your new bitstream.

Fixed manual delay

Embed constants in RTL instance of I/O Blocks, connecting a value to a delay port. To calculate this value precisely is tricky, because you need to cover uncertainty of the PCB design and part variations. Not all input ports have equal capacitance, nor are all PCB traces the same length.

Fixed manual skew

Generate an I/O clock for your bus, and use a PLL setting to provide a delay. Conceptually, you might set the delay to the desired hold time and then propagate that to your outbound signals. Be wary, as static timing is not obvious to constrain and meet. For this reason, I would avoid this method for programmable technology.

Programmable delay

Similar to #2, only now you connect the delay to a programmable register within the design. This allows you to come up in the lab and then "dial in" the ideal setting, either by direct observation of the signals via oscilloscope, or indirect behavior. This is the most versatile solution, but note that in production you are going to have some variations based on parts (regulator voltage uncertainty, process variation) as well as environment (temperature)

Self-calibrating delay

Similar to #4, but now the part itself does the finding of the best delay. Typically this is done with a "write-confirm" action on a bus. You write a register in the slave part, then read it back. Using this feedback loop, you scan the window by iterating delay values, then pick the midpoint between the successful transmission window edges. This can be performed on startup, periodically, or in response to environmental changes. Usually it doesn't come to all that unless you are dealing with high speed serialized I/O which yours is not.

Faster clock and capture pulse

The idea is to multiply your clock, then have a state machine that triggers an enable. More precise than "negedge" but grainier than most I/O Delay blocks. So you can move the signals 1/8 of a cycle at a time, or 1/16. I only mention this so if you ever see it somewhere, you know to avoid it.

FPGA Synchronous Bus Timing

1 Answers1