6

I have a question regarding the compilation of HDL programs within the context of FPGA design.

1) Why does the compilation process take so long? Is it really the compilation process that takes a long time, or is it the writing of individual logic gates that take a long time?

2) Why are the compiled files generally referred to as 'bitfiles'? What are the format of these bitfiles? I'm picturing a 2 dimensional matrix of gates that will either be opened or closed depending on the bits in the bitfile.

Thanks for any help!

nanofarad
  • 40,330
  • 4
  • 86
  • 117
Izzo
  • 4,461
  • 13
  • 45
  • 82

2 Answers2

13

1) Why does the compilation process take so long? Is it really the compilation process that takes a long time, or is it the writing of individual logic gates that take a long time?

To begin, if you want to see all the toil and hard work your FPGA tools do, just turn on verbose mode/detailed reports, and skim/read them.

I'm going to answer with a Xilinx viewpoint, since that's what I know. Although the processes may have different names/groupings/ordering, the idea is the same across vendors.

The HDL->bytecode process differs slightly from how one would compile, say, Java. It's not just conversion of each line to some bytecode, but an involved process in which the entire design is converted to a hardware implementation. You're not converting a program to hardware, but a description of hardware to hardware. You only call a pile of Verilog or VHDL a program when it's running a testbench in a simulator.

Remember that timing constraints are a thing, and thus optimization for timing/depth of logic is a top priority.

In practice, synthesis encompasses conversion of behavioral Verilog/VHDL to RTL representation, including FSM synthesis, extraction of boolean functions, optimization, decoders/encoders, muxes, ROMs, etc. Additionally, the synth step will duplicate registers whose values are needed in multiple areas on the FPGA, so that the routing delays to those areas are minimized. Some synth tools, such as XST, will provide a rough estimate of timing and device utilization at this stage.

Additionally, remember that synthesis involves some level of inference. HDL code that matches certain motifs/patterns will be converted to hardware macros or instantiations of certain primitives. If I write code that accesses a large reg[7:0] foo [2047:0] synchronously based on an address (and possibly a write enable) then the synth tool will want to detect that and put a block RAM in place. It will also try to optimize un-needed logic and may do fairly in-depth logical analysis in that optimization.

Translation/mapping involves tons of hardware logic intricacies as well--at this stage the software will try to stuff your logic functions into lookup tables in optimal ways, fit those into slices alongside the flipflops that they may drive, and optimize again. At this step, redundant or superfluous components left over from optimization may be removed.

Placing and routing is by far one of the more intensive steps in some designs. Now that mapping gave a sea of lookup tables and registers connected by a slew of wires, they all need to be placed using limited interconnect resources. The limitations include number of lines in a row/column, what bits can connect to other bits at certain distances, as well as clock distribution. Remember again that timing constraints exist. PAR may be able to place a design quickly, but spend a very long time trying to tweak the placement to fit those constraints. Placing and routing isn't an easy-to-solve problem, and involves tons of brute-force, random placement based on cost tables, and other unique approaches. Needless to say, this can take a long time.

Imagine trying to organize the below-shown circuit with no more than two crossings per wire and no more than 25cm of wire in the timing-critical path, just on the scale of an FPGA:

enter image description here source

2) Why are the compiled files generally referred to as 'bitfiles'? What are the format of these bitfiles? I'm picturing a 2 dimensional matrix of gates that will either be opened or closed depending on the bits in the bitfile.

You're pretty close, though not exactly. The bitstream configures the following parameters:

  • Routing. What signals go where, over what wires. This typically sets multiplexers and cross-connections. Pretty spot-on to what you mention, though they're really not gates more than connections (although fully buffered to avoid capacitance effects)

  • Slices. Each slice contains a few lookup tables used for function generators, as well as more multiplexers and such. The bitstream also specifies the contents of the lookup tables, whether they should be bypassed or linked, whether the output should go straight to routing or to a flip-flop, whether that flipflop should have an async reset, whether it should be posedge or negedge, and so on. For distributed memory slices, configuration related to writing/shifting the LUT under external control.

  • Other function blocks: How DSP/multiplier tiles should be configured, parameters/connectivity for clock-handling circuitry such as DCMs/PLLs/MMCMs/etc, widths/fallthrough/initial contents of block RAMs, the parameters for transcievers, et cetera.

  • Metadata. Possibly prevent reading back the bitstream over the configuration port/JTAG, if it should not be copied.

nanofarad
  • 40,330
  • 4
  • 86
  • 117
  • Very complete, I definitely oversimplified. – Mitch May 04 '16 at 23:37
  • @Mitch Thanks. Just by sheer luck, I TA'd a high school electronics class just today and explained the process from start to finish to students, including all these steps and a demo. – nanofarad May 04 '16 at 23:38
  • @hexafraction Great answer. One point in the OP I think needs addressing is the reference to "HDL ***programs***". This view of HDL development is frequently a cause of design issues. The H and the D are critical features. Only during verification can they really be deemed programs, and that's not the primary purpose. – PlayDough May 05 '16 at 04:04
  • 2
    One other point. The term "compilation" is usually reserved for analysis leading to simulation. This is usually very fast. Depending on the size of the design, the next step--elaboration--can take a bit of time. For conversion to bit files, the steps are synthesis, mapping, and place and route. The first two are relatively fast. P&R is usually the longest. Though this depends on the design. – PlayDough May 05 '16 at 04:08
  • @PlayDough Thanks. One question: What exactly happens during the elaboration process? I'm familiar with the HDL->hardware workflow but almost a complete fool with anything beyond basic simulation. – nanofarad May 05 '16 at 10:29
  • @hexafraction Elaboration is the process of actually creating simulation objects for instances, sizing ports, establishing connections, etc. For example, unrolling `generate` loops with instances, sizing signals and ports, and establishing port associations. Depending upon the complexity of the entire simulation, such as number of instances, size of signals, number of ports, etc, this can take quite some time. This is usually the most memory intensive part of simulation. – PlayDough May 05 '16 at 15:03
2

Compilation requires finding a set gates that in layout match the coded logic. This is a difficult optimization problem to solve - I think it could be viewed as an edge crossing problem.

Bitfiles are the array of "fuses" that will be set to configure the network of gates on-chip. Without looking at a particular chip, I would guess that one fuse is represented by one bit with a chip-defined ordering.

Mitch
  • 21,223
  • 6
  • 63
  • 86