Any benefits from implementing CSA versus just using multiplication symbol when synthesizing?

Question

I am synthesizing some multiplication units in verilog and I was wondering if you generally get better results in terms of area/power savings if you implement your own CSA using booth encoding when multplying or if you just use the * symbol and let the synthesis tool take care of the problem for you?

Thank you!

Depends entirely on the synthesis tool you use. A good one may imply a better design than you can come up with by hand, or a poor one may choose something worse than you had in mind. — Tim, Aug 01 '13 at 23:34
I believe dc has some directives you can use to communicate what kind of multipliers you want it to use (speed optimized, area optimized, booth, etc). Check your documentation to see if there are such options. — Tim, Aug 01 '13 at 23:43
Code it using multiply. Get it working, synthesise it. Then, if it not small enough or lean enough, optimise it. Don't optimise first. You'll spend more time than you needed to. — Paul S, Aug 09 '13 at 15:28

score 1 · Accepted Answer · answered Aug 02 '13 at 00:43

1

Generally, I tend to trust the compiler tools I use and don't fret so much about the results as long as they meet my timing and area budgets.

That said, with multipliers that need to run at fast speeds I find I get better results (in DC, at least) if I create a Verilog module containing the multiply (*) and a retiming register or two, and push down into this module to synthesise it before popping up to toplevel synthesis. It seems as if the compiler gets 'distracted' by other timing paths if you try to do everything at once, so making it focus on a multiplier that you know is going to be tricky seems to help.

answered Aug 02 '13 at 00:43

Marty

6,494
3
37
40

I have register retiming enabled, however, you said to add a retiming register or two? Do you just mean add some registers in your datapath that the synthesis tool can try pushing around to allow it to see if it can get better results by optimizing the delays of each stage? – Veridian Aug 02 '13 at 00:50
@starbox: Normally, you use register retiming to *move* registers through a pipelined design, so you preserve functionality at the I/O boundaries. If you want to pipeline a multiplier, add one or more registers at the module output (or input), and `balance_registers` to get DC to move the registers to minimise the cycle time. Lots of details in 'Design Compiler Reference Manual: Register Retiming'. – EML Aug 02 '13 at 07:57

Morgan · Answer 2 · 2013-08-04T06:51:54.657

I agree with @Marty in that I would use *. I have previously built my own low power adder structures, which then ran in to problems when the design shifted process/had to be run at a higher frequency. Hard coded architectures like this remove quite a bit of portability from the code.

Using the directives is nice in trials to see the different size (area) of architectures, but I leave the decision to the synthesis tool to make the best call based on the timing constraints and available area. I am not sure how power aware the tools are by default. Previously we ended up getting an extra license which added a lot of power aware knowledge to the synthesis.

score 0 · Answer 3 · answered Aug 02 '13 at 17:55

You have this question tagged with "FPGA." If your target device is an FPGA then it may be advisable to use FPGA's multiplier megafunction (don't remember what Xilinx calls it these days.)

This way, you will be sure that the tool utilizes the whatever internal hardware structure that you intend to use irrespective of synthesizer tool. You will be sure to get an optimum solution that is also predictable from a timing and latency standpoint.

Additionally, you don't have to test it for all the corner cases, especially important if you are doing signed multiplication and what kind of coding guidelines you follow.

Any benefits from implementing CSA versus just using multiplication symbol when synthesizing?

3 Answers3