I have a 16-bit single-cycle, very sparse MIPS implementation that I've been working on in Verilog. Everything works except for the fact that branching is delayed by one entire clock cycle.
always @(posedge clock) begin
// Necessary to add this in order to ensure PC => PC_next
iaddr <= pc_next
end
The above code is used to update the program counter/instruction address, which comes from a module, PCLogic:
module PCLogic(
pc_next, // next value of the pc
pc, // current pc value
signext, // from sign extend circuit
branch, // beq instruction
alu_zero, // zero from ALU, used in cond branch
reset // reset input
);
output [15:0] pc_next;
input [15:0] pc;
input [15:0] signext; // From sign extend circuit
input branch;
input alu_zero;
input reset;
reg [15:0] pc_next;
always @(pc or reset) begin
if (reset == 1)
pc_next = 0;
else if (branch == 1 && alu_zero == 1)
pc_next = pc+2+(signext << 1);
else
pc_next = pc+2;
end
endmodule
iaddr
is a simple 16-bit register that stores the program counter.
I don't understand why there might be a problem with this circuit, but for some reason, the entire circuit is delayed by a single clock cycle until it branches (e.g. if I have a BEQ instruction at 0x16 that always jumps, it will execute the next instruction at 0x18 and then jump to the relative offset, but from 0x20).
I can almost feel like the solution is right in front of me but I don't know what I'm missing about the semantic. The offset problem is solved if I remove the +2
that's always implicit unless there is a true "bubble" or hardware-induced no-op, but the delay is still present.
Can someone explain to me what causes the delay and why it happens?