Yosys / abc uses many gates instead of better monolithic cell

Question

For a simple design and custom cell library, I am getting synthesis results in which Yosys / abc chooses a result that is obviously (for the human reader) worse, and which ignores an obvious alternative implementation. It seems to me that the result Yosys / abc produces is worse both in terms of area and speed, though for the latter I am not sure because my .lib file is missing all delay/drive/load information and I'm not yet sure what kind of default is used. What I would like is to optimize area, not speed.

Design (SevenSegmentDecoder.v):

module SevenSegmentDecoder(
    input[3:0] encoded,
    output reg decoded,
    output[3:0] dummy
);
    assign dummy = ~encoded;
    always @(*) begin
        case (encoded)
            4'd0: decoded = 1;
            4'd1: decoded = 0;
            4'd2: decoded = 1;
            4'd3: decoded = 1;
            4'd4: decoded = 0;
            4'd5: decoded = 1;
            4'd6: decoded = 1;
            4'd7: decoded = 1;
            4'd8: decoded = 1;
            4'd9: decoded = 1;
            4'd10: decoded = 1;
            4'd11: decoded = 0;
            4'd12: decoded = 1;
            4'd13: decoded = 0;
            4'd14: decoded = 1;
            4'd15: decoded = 1;
            default: decoded = 0;
        endcase
    end
endmodule

This is the logic for a single segment from a seven-segment decoder. The "dummy" output enforces the presence of inverter cells for the inputs. This is meant to push Yosys / abc towards the intended implementation, but the latter does not work.

Shell script for synthesis (build.sh):

rm show.dot
rm show.svg
rm out.v
rm -rf _tmp_yosys-abc-*
yosys -s build.yosys

Synthesis script (build.yosys):

#
# input
#
read_verilog SevenSegmentDecoder.v

#
# synthesis
#
synth -top SevenSegmentDecoder

#
# tech mapping
#
dfflibmap -liberty own.lib
abc -liberty own.lib -nocleanup
clean
stat -liberty own.lib

#
# output
#
write_verilog out.v
read_liberty -lib own.lib
show -format svg -prefix show

Cell library (own.lib):

/*
 delay model :       typ
 check model :       typ
 power model :       typ
 capacitance model : typ
 other model :       typ
*/
library(my_cells) {
    cell(Inverter) {
        area: 1;
        pin(a) {
            direction: input;
        }
        pin(out) {
            direction: output;
            function: "(!a)";
        }
    }
    cell(Buffer) {
        area: 2450;
        pin(a) {
            direction: input;
        }
        pin(out) {
            direction: output;
            function: "a";
        }
    }
    cell(Nand3) {
        area: 2940;
        pin(a) {
            direction: input;
        }
        pin(b) {
            direction: input;
        }
        pin(c) {
            direction: input;
        }
        pin(out) {
            direction: output;
            function: "(!(a b c))";
        }
    }
    cell(Nor) {
        area: 2450;
        pin(a) {
            direction: input;
        }
        pin(b) {
            direction: input;
        }
        pin(out) {
            direction: output;
            function: "(!(a+b))";
        }
    }
    cell(OrAndInvert) {
        area: 2940;
        pin(a) {
            direction: input;
        }
        pin(b) {
            direction: input;
        }
        pin(c) {
            direction: input;
        }
        pin(out) {
            direction: output;
            function: "(!((a+b) c))";
        }
    }
    cell(seg0) {
        area: 1;
        pin(a) {
            direction: input;
        }
        pin(b) {
            direction: input;
        }
        pin(c) {
            direction: input;
        }
        pin(d) {
            direction: input;
        }
        pin(na) {
            direction: input;
        }
        pin(out) {
            direction: output;
            function: "(!((na (!b) (!c) d)+(na b (!c) (!d))+(a b (!c) d)+(a (!b) c d)))";
        }
    }
}

The last cell, seg0, could easily implement the 'decoded' output when combined with an inverter to produce (na := !a), yet Yosys / abc chooses to use a network of 1xNand3, 3xNor, 3xOrAndInvert. To guide the implementation, presence of the inverter is enforced through the "dummy" output anyway, and the seg0 cell has an area of 1 -- yet it is ignored. Also, the generated network uses multiple gates in series, while seg0 would be a single stage of logic. Absent specific delay/load values, I'm assuming that the seg0 cell would be modeled to be faster this way.

Now try a slightly modified cell library in which the seg0 cell inverts "a" itself:

    cell(seg0) {
        area: 1000000;
        pin(a) {
            direction: input;
        }
        pin(b) {
            direction: input;
        }
        pin(c) {
            direction: input;
        }
        pin(d) {
            direction: input;
        }
        pin(out) {
            direction: output;
            function: "(!(((!a) (!b) (!c) d)+((!a) b (!c) (!d))+(a b (!c) d)+(a (!b) c d)))";
        }
    }

Even though this cell has a huge area, it gets picked up by Yosys / abc and all the Nand, Nor and OAI gates disappear. Only the inverters are still present, of course, to generate the "dummy" output.

Why does Yosys / abc not use the seg0 cell when it has to be combined with an inverter to produce "na", even when that inverter is already present?

Yosys Version: Yosys 0.9+1706 (git sha1 58ab9f60, clang 6.0.0-1ubuntu2 -fPIC -Os)

ABC Version (the yosys-abc executable that comes with Yosys): UC Berkeley, ABC 1.01 (compiled Jan 12 2020 20:46:55)

Since abc was mentioned, here are the relevant files generated by Yosys.

abc.script:

echo + read_blif _tmp_yosys-abc-jAcBTF/input.blif;
read_blif _tmp_yosys-abc-jAcBTF/input.blif;
echo + read_lib -w /home/martin/git-repos/chipdraw/resource/7seg/own.lib;
read_lib -w /home/martin/git-repos/chipdraw/resource/7seg/own.lib;
echo + strash;
strash;
echo + ifraig;
ifraig;
echo + scorr;
scorr;
echo + dc2;
dc2;
echo + dretime;
dretime;
echo + strash;
strash;
echo + &get -n;
&get -n;
echo + &dch -f;
&dch -f;
echo + &nf ;
&nf ;
echo + &put;
&put;
echo + write_blif _tmp_yosys-abc-jAcBTF/output.blif;
write_blif _tmp_yosys-abc-jAcBTF/output.blif

input.blif:

.model netlist
.inputs ys__n0 ys__n1 ys__n3 ys__n4
.outputs ys__n34 ys__n35 ys__n36 ys__n37 ys__n38
# ys__n0     \encoded [1]
# ys__n1     \encoded [0]
# ys__n2     $abc$258$new_n10_
# ys__n3     \encoded [3]
# ys__n4     \encoded [2]
# ys__n5     $abc$258$new_n11_
# ys__n6     $abc$258$new_n12_
# ys__n7     $abc$258$new_n13_
# ys__n8     $abc$258$new_n14_
# ys__n9     $abc$258$new_n15_
# ys__n10    $abc$258$new_n16_
# ys__n11    $abc$258$new_n17_
# ys__n12    $abc$258$new_n18_
# ys__n13    $abc$258$new_n19_
# ys__n14    $abc$258$new_n20_
# ys__n15    $abc$258$new_n21_
# ys__n16    $abc$258$new_n22_
# ys__n17    $abc$258$new_n23_
# ys__n18    $abc$258$new_n24_
# ys__n19    $abc$258$new_n25_
# ys__n20    $abc$258$new_n26_
# ys__n21    $abc$258$new_n27_
# ys__n22    $abc$258$new_n28_
# ys__n23    $abc$258$new_n29_
# ys__n24    $abc$258$new_n30_
# ys__n25    $abc$258$new_n31_
# ys__n26    $abc$258$new_n32_
# ys__n27    $abc$258$new_n33_
# ys__n28    $abc$258$new_n34_
# ys__n29    $abc$258$new_n35_
# ys__n30    $abc$258$new_n36_
# ys__n31    $abc$258$new_n37_
# ys__n32    $abc$258$new_n38_
# ys__n33    $abc$258$new_n39_
# ys__n34    \decoded
# ys__n35    \dummy [0]
# ys__n36    \dummy [1]
# ys__n37    \dummy [2]
# ys__n38    \dummy [3]
.names ys__n0 ys__n1 ys__n2
00 1
.names ys__n3 ys__n4 ys__n5
00 1
.names ys__n5 ys__n2 ys__n6
11 1
.names ys__n0 ys__n1 ys__n7
10 1
.names ys__n7 ys__n5 ys__n8
11 1
.names ys__n8 ys__n6 ys__n9
-1 1
1- 1
.names ys__n0 ys__n1 ys__n10
11 1
.names ys__n10 ys__n5 ys__n11
11 1
.names ys__n1 ys__n0 ys__n12
10 1
.names ys__n3 ys__n4 ys__n13
1- 1
-0 1
.names ys__n12 ys__n13 ys__n14
10 1
.names ys__n14 ys__n11 ys__n15
-1 1
1- 1
.names ys__n15 ys__n9 ys__n16
-1 1
1- 1
.names ys__n7 ys__n13 ys__n17
10 1
.names ys__n10 ys__n13 ys__n18
10 1
.names ys__n18 ys__n17 ys__n19
-1 1
1- 1
.names ys__n4 ys__n3 ys__n20
1- 1
-0 1
.names ys__n2 ys__n20 ys__n21
10 1
.names ys__n12 ys__n20 ys__n22
10 1
.names ys__n22 ys__n21 ys__n23
-1 1
1- 1
.names ys__n23 ys__n19 ys__n24
-1 1
1- 1
.names ys__n24 ys__n16 ys__n25
-1 1
1- 1
.names ys__n7 ys__n20 ys__n26
10 1
.names ys__n3 ys__n4 ys__n27
0- 1
-0 1
.names ys__n2 ys__n27 ys__n28
10 1
.names ys__n28 ys__n26 ys__n29
-1 1
1- 1
.names ys__n7 ys__n27 ys__n30
10 1
.names ys__n10 ys__n27 ys__n31
10 1
.names ys__n31 ys__n30 ys__n32
-1 1
1- 1
.names ys__n32 ys__n29 ys__n33
-1 1
1- 1
.names ys__n33 ys__n25 ys__n34
-1 1
1- 1
.names ys__n1 ys__n35
0 1
.names ys__n0 ys__n36
0 1
.names ys__n4 ys__n37
0 1
.names ys__n3 ys__n38
0 1
.end

stdcells.genlib:

GATE ZERO    1 Y=CONST0;
GATE ONE     1 Y=CONST1;
GATE BUF    1 Y=A;                  PIN * NONINV  1 999 1 0 1 0
GATE NOT    2 Y=!A;                 PIN * INV     1 999 1 0 1 0
GATE AND    4 Y=A*B;                PIN * NONINV  1 999 1 0 1 0
GATE NAND   4 Y=!(A*B);             PIN * INV     1 999 1 0 1 0
GATE OR     4 Y=A+B;                PIN * NONINV  1 999 1 0 1 0
GATE NOR    4 Y=!(A+B);             PIN * INV     1 999 1 0 1 0
GATE XOR    5 Y=(A*!B)+(!A*B);      PIN * UNKNOWN 1 999 1 0 1 0
GATE XNOR   5 Y=(A*B)+(!A*!B);      PIN * UNKNOWN 1 999 1 0 1 0
GATE ANDNOT 4 Y=A*!B;               PIN * UNKNOWN 1 999 1 0 1 0
GATE ORNOT  4 Y=A+!B;               PIN * UNKNOWN 1 999 1 0 1 0
GATE MUX    4 Y=(A*B)+(S*B)+(!S*A); PIN * UNKNOWN 1 999 1 0 1 0

output.blif:

# Benchmark "netlist" written by ABC on Mon Jan 13 18:44:03 2020
.model netlist
.inputs ys__n0 ys__n1 ys__n3 ys__n4
.outputs ys__n34 ys__n35 ys__n36 ys__n37 ys__n38
.gate Inverter    a=ys__n0 out=ys__n36
.gate Inverter    a=ys__n1 out=ys__n35
.gate Inverter    a=ys__n3 out=ys__n38
.gate Inverter    a=ys__n4 out=ys__n37
.gate Nor         a=ys__n3 b=ys__n37 out=new_n14_
.gate Nor         a=ys__n38 b=ys__n4 out=new_n15_
.gate Nor         a=ys__n0 b=ys__n35 out=new_n16_
.gate OrAndInvert a=new_n14_ b=new_n15_ c=new_n16_ out=new_n17_
.gate OrAndInvert a=ys__n3 b=ys__n37 c=ys__n35 out=new_n18_
.gate OrAndInvert a=ys__n38 b=ys__n4 c=ys__n0 out=new_n19_
.gate Nand3       a=new_n17_ b=new_n18_ c=new_n19_ out=ys__n34
.end

This is really an issue in abc which does the liberty mapping, rather than Yosys. My guess is that it isn't doing enough sharing of logic - although ultimately area optimal techmapping is a hard problem and an optimal solution can't be guaranteed in any case. — gatecat, Jan 12 '20 at 22:43
I added abc-related info. Adding a tag is beyond my SO rep. The fact that finding an optimal solution is unrealistic is IMHO irrelevant since Yosys/abc stop after less than one second, so they don't even try to improve the result. In case abc is the culprit here, where is a Yosys user expected to find information about it? Anything I found on the web is horribly outdated and didn't help me the slightest. — Martin Geisse, Jan 13 '20 at 17:55

Yosys / abc uses many gates instead of better monolithic cell

0 Answers0