I find myself implementing a Verilog code for interfacing a FT600 USB3.0 FIFO to a Lattice ICE40 FPGA. The question I will ask here is not specific to this parts though, as it applies to whenever you have to design a state machine and read/write data to a synchronous parallel bus.
I'm sure it is very basic stuff, but I just can not find a satisfying answer anywhere in the Internet, and I can not think of another way to formulate the problem. Here it goes.
Here is the timing diagram of the bus in question. (taken from ft600 data sheet, omitting redundant parts):
Observing the diagram, we see that the data and control signals provided by the FT600 are stable during the rising clock edges. Therefore, the FSM must sample those signals and change state accordingly on the rising edges of the clock (
always @(posedge clk)
). Is this reasoning correct?I am implementing a Moore FSM where the outputs depend only on the current state. Let's say the initial state is
RX_WAIT
. As soon as the FSM samples the RXF_N=0 line at rising clock (A), the state will change toRX_PRE
. Then, a combinational block translates the stateRX_PRE
into the FPGA outputs OE_N=0, RD_N=0. The problem is: If this combinational block is very fast, the outputs will change at the red line just after (A), not in the black line between rising clocks as it should be. This could violate the hold condition of the chip. I can think of two solutions for this:A) Putting a register that samples the output after the combinational block at the falling edge of the clock. Then, we will have problems if the combinational block is slower than half a clock cycle. Also, I've been told that is not good to mix rising and falling edge flip flops unless you are doing DDR.
B) Ensuring somehow that the delay of the combinational block is exactly half a clock cycle, adding delay if necessary (Is this what we want? Make the system slower?). In this case, how can I instruct the compiler to do that? I'm using Ice Cube 2 which supports timing constraints similar to the Altera's, but I've never used them and I am not familiar with the terms (Output delay, Input delay, Max delay, Multicycle, Clock Latency...) nor how to use them.
I'm pretty sure (B) is the way to go, if any experienced user must provide me some advice I would be really thankful.