I am implementing Cholesky decomposition in verilog, following python code below
def cholesky(A):
n = len(A)
L = [[0.0] * n for i in xrange(n)]
for i in xrange(n):
for j in xrange(i+1):
tmp_sum = sum(L[i][k] * L[j][k] for k in xrange(j))
if (i == j): # Diagonal element
L[i][j] = sqrt(A[i][i] - tmp_sum)
else:
L[i][j] = (1.0/L[j][j] * (A[i][j] - tmp_sum))
return L
I tried to do a simple one with 3x3 input size. Since it requires division and square root, I also write a division using standard method (copied from internet with some modification) and a sqrt using Babylonian method (a variant of Newton's method). Here they are:
Division
module Div(in1, in2, out);
input [23:0] in1, in2;
output reg [23:0] out;
// reg [23:0] remainder;
reg [47:0] scaled_divider, temp_remainder, temp_result;
integer i;
always @ (in1 or in2) begin
scaled_divider = {1'b0, in2, 23'h0};
temp_remainder = {24'h0, in1};
for (i=0; i<24; i=i+1) begin
temp_result = temp_remainder - scaled_divider;
if (temp_result[47-i]) begin // Negative result, quotient set to '0'
out[23-i] = 1'b0;
end else begin
out[23-i] = 1'b1;
temp_remainder = temp_result;
end
scaled_divider = scaled_divider >> 1;
end
// remainder = temp_remainder[23:0];
end
endmodule
Sqrt
module Sqrt_newton(in, out);
// 3 iterations
input [23:0] in;
output reg [23:0] out;
Div div1(in, out, tmp_inout1);
Div div2(in, tmp_inout2, tmp_inout3);
Div div3(in, tmp_inout4, tmp_inout5);
always @ (in)
begin
out[0] = 1'b1;
out[1] = 1'b1;
out[2] = 1'b1;
out[3] = 1'b1;
out[4] = 1'b1;
out[5] = 1'b1;
out[6] = 1'b1;
out[7] = 1'b1;
tmp_inout2 = (out + tmp_inout1) >> 1;
tmp_inout4 = (tmp_inout2 + tmp_inout3) >> 1;
out = (tmp_inout4 + tmp_inout5) >> 1;
end
endmodule
And here's my 3x3 cholesky decomposition code:
module cholesky_template(clk, rst, g_input, e_input, o);
input clk, rst;
input [143:0] g_input;
input e_input;
output [215:0] o;
reg [23:0] L [0:2][0:2];
reg [23:0] A [0:2][0:2] ;
assign o = {
L[0][0], L[0][1], L[0][2],
L[1][0], L[1][1], L[1][2],
L[2][0], L[2][1], L[2][2]
};
reg [23:0] tmp_A00_minus_sum;
reg [23:0] tmp_A11_minus_sum;
reg [23:0] tmp_A22_minus_sum
reg [23:0] tmp_A10_minus_sum;
reg [23:0] tmp_A20_minus_sum;
reg [23:0] tmp_A21_minus_sum;
reg [23:0] div_1_L00;
reg [23:0] div_1_L11;
Sqrt sqrt0(tmp_A00_minus_sum, L[0][0]);
Div div0(1'b1, L[0][0], div_1_L00);
Sqrt sqrt1(tmp_A11_minus_sum, L[1][1]);
Div div1(1'b1, L[1][1], div_1_L11);
Sqrt sqrt2(tmp_A22_minus_sum, L[2][2]);
always @ (posedge clk or posedge rst) begin
if (rst)
L[0][0] = 1'b0;
L[0][1] = 1'b0;
L[0][2] = 1'b0;
L[1][0] = 1'b0;
L[1][1] = 1'b0;
L[1][2] = 1'b0;
L[2][0] = 1'b0;
L[2][1] = 1'b0;
L[2][2] = 1'b0;
tmp_sum = 1'b0;
A[0][0] ={8'b00000000, g_input[15:0]};
A[0][1] =24'b0; // will not be used
A[0][2] =24'b0; // will not be used
A[1][0] ={8'b00000000, g_input[63:48]};
A[1][1] ={8'b00000000, g_input[79:64]};
A[1][2] =24'b0; // will not be used
A[2][0] ={8'b00000000, g_input[111:96]};
A[2][1] ={8'b00000000, g_input[127:112]};
A[2][2] ={8'b00000000, g_input[143:128]};
end else begin
tmp_A00_minus_sum = A[0][0] - tmp_sum;
tmp_A10_minus_sum = A[1][0] - tmp_sum;
L[1][0] = div_1_L00 * tmp_A10_minus_sum;
tmp_sum = tmp_sum + L[1][0] * L[1][0];
tmp_A11_minus_sum = A[1][1] - tmp_sum;
tmp_A20_minus_sum = A[2][0] - tmp_sum;
L[2][0] = div_1_L00 * tmp_A20_minus_sum;
tmp_sum = tmp_sum + L[2][0] * L[1][0];
tmp_A21_minus_sum = A[2][1] - tmp_sum;
L[2][1] = div_1_L11 * tmp_A21_minus_sum;
tmp_sum = tmp_sum + L[2][0] * L[2][0];
tmp_sum = tmp_sum + L[2][1] * L[2][1];
tmp_A22_minus_sum = A[2][2] - tmp_sum;
end
end
endmodule
Some explanations on the code: I failed to use for-loops so I unrolled them to something like tmp_A10_minus_sum = A[1][0] - tmp_sum;
. It should be fairly easy to map to the python code. The reason to insert 8 zeros before A
is that I'll try to "upgrade" the code to a use 24 bits, so that it can gets more accurate. This is not the problem.
Three-state bus warnings
The problem is when I compile it using Synopsys DC, it outputs warnings like this:
"Warning: In design 'cholesky_template', three-state bus 'tmp_A00_minus_sum[23]' has non three-state driver 'tmp_A00_minus_sum_reg[23]/Q'. (LINT-34)"
This is DC's description of LINT-34:
NAME LINT-34 (warning) In design '%s', three-state bus '%s' has non three- state driver '%s'.
DESCRIPTION Synopsys libraries contain descriptions of three-state driving pins on components. Synopsys tools classify a net as a three-state net if it is driven by at least one pin that has this three-state attribute. Normally, if there are multiple drivers on such nets, it is assumed that all driving pins should be three-state drivers, for correct opera- tion of the three-state bus. This warning message indicates a situa- tion where at least one non-three-state driver appears on a three-state net.
WHAT_NEXT Verify that this is what you have intended for the given net. If the non-three-state driver pin specified in the message is really on a three-state driver in your ASIC technology, verify that the technology library description is correct.
Why there's three-state attributes in the design? How do I correct them?
Target library contains no replacement for register
This is another warning I get, for example:
Warning: Target library contains no replacement for register 'A_reg[1][0][7]' (FFGEN). (TRANS-4)
Here's my library code and I wonder if this has anything to do with three-state bus warning? If so, is there any reference to design the appropriate cells?
library(HML){
cell(AND) {
area: 6;
pin(A) {
direction: input;
capacitance: 1;
}
pin(B) {
direction: input;
capacitance: 1;
}
pin(Z) {
direction: output;
function: "A B";
timing() {
intrinsic_rise: 0.48;
intrinsic_fall: 0.77;
rise_resistance: 0.1443;
fall_resistance: 0.0523;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "A";
}
timing() {
intrinsic_rise: 0.48;
intrinsic_fall: 0.77;
rise_resistance: 0.1443;
fall_resistance: 0.0523;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "B";
}
}
}
cell(OR) {
area: 6;
pin(A) {
direction: input;
capacitance: 1;
}
pin(B) {
direction: input;
capacitance: 1;
}
pin(Z) {
direction: output;
function: "A+B";
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "A";
}
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "B";
}
}
}
cell(XOR) {
area: 0;
pin(A) {
direction: input;
capacitance: 1;
}
pin(B) {
direction: input;
capacitance: 1
}
pin(Z) {
direction: output;
function: "A^B";
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "A";
}
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "B";
}
}
}
cell(NAND) {
area: 6;
pin(A) {
direction: input;
capacitance: 1;
}
pin(B) {
direction: input;
capacitance: 1
}
pin(Z) {
direction: output;
function: "(A B)'";
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "A";
}
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "B";
}
}
}
cell(NOR) {
area: 6;
pin(A) {
direction: input;
capacitance: 1;
}
pin(B) {
direction: input;
capacitance: 1
}
pin(Z) {
direction: output;
function: "(A+B)'";
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "A";
}
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "B";
}
}
}
cell(XNOR) {
area: 6;
pin(A) {
direction: input;
capacitance: 1;
}
pin(B) {
direction: input;
capacitance: 1
}
pin(Z) {
direction: output;
function: "(A^B)'";
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "A";
}
timing() {
intrinsic_rise: 0.28;
intrinsic_fall: 0.85;
rise_resistance: 0.1443;
fall_resistance: 0.0589;
slope_rise: 0.0;
slope_fall: 0.0;
related_pin: "B";
}
}
}
cell(DFF) {
area : 9;
pin(D) {
direction : input;
capacitance : 1;
timing() {
timing_type : setup_rising;
intrinsic_rise : 0.85;
intrinsic_fall : 0.85;
related_pin : "CLK";
}
timing() {
timing_type : hold_rising;
intrinsic_rise : 0.4;
intrinsic_fall : 0.4;
related_pin : "CLK";
}
}
pin(I) {
direction : input;
capacitance : 1;
timing() {
timing_type : setup_rising;
intrinsic_rise : 0.85;
intrinsic_fall : 0.85;
related_pin : "CLK";
}
timing() {
timing_type : hold_rising;
intrinsic_rise : 0.4;
intrinsic_fall : 0.4;
related_pin : "CLK";
}
}
pin(CLK) {
direction : input;
capacitance : 1;
}
pin(RST) {
direction : input;
capacitance : 2;
}
ff("IQ", "IQN") {
next_state : "D";
clocked_on : "CLK";
clear : "RST (I')";
preset: "RST I";
clear_preset_var1: L;
clear_preset_var2: H;
}
pin(Q) {
direction : output;
function : "IQ";
internal_node : "Q";
timing() {
timing_type : rising_edge;
intrinsic_rise : 1.19;
intrinsic_fall : 1.37;
rise_resistance : 0.1458;
fall_resistance : 0.0523;
related_pin : "CLK";
}
timing() {
timing_type : clear;
timing_sense : positive_unate;
intrinsic_fall : 1.29;
fall_resistance : 0.0516;
related_pin : "RST";
}
timing() {
timing_type : preset;
timing_sense : positive_unate;
intrinsic_fall : 1.29;
fall_resistance : 0.0516;
related_pin : "I";
}
}
}
cell(IV){
area:0;
cell_footprint : "iv";
pin(A) {
direction: input;
capacitance: 1;
}
pin(Z) {
direction: output;
function : "A'";
timing() {
intrinsic_rise : 0.38;
intrinsic_fall : 0.15;
rise_resistance : 0.1443;
fall_resistance : 0.0589;
slope_rise : 0.0;
slope_fall : 0.0;
related_pin : "A";
}
}
}
}
Sorry for being a long post. I hope I asked my questions clearly.