Design Compiler, Registers & Synthesis Runtime

Recently, I was working on a (not so big) module which required use of big registers. While synthesizing this module, DC was taking more than 4 hours of time on certain machine and it was becoming very frustrating. Finally I gave up and decided to investigate and address this issue. Following is the experiment information and possible fixes.

Hypothetical Requirements

3 sets of counters are needed to maintain status of various channels (512 channels). Each counter is 12 bit wide and at any given event, 6 counters from a set of counters may be accessed. In verilog, these counters can be declared as followed

reg [11:0]   A[0:511];

reg [11:0]   B[0:511];

reg [11:0]   C[0:511];

There is a combinatorial logic reading these counters by providing two index variables – one is 4 bit wide (16) and another is 5 bit wide (32). Index is calculated by concatenating these two variable (16*32 = 512). Read and write code may look like as followed

input reg [3:0] index0;

input reg [4:0] index1;

output wire [11:0] dout;

input wire [11:0] din;

wire [8:0] index = {index0, index1}; 

assign dout = A[index];

….

A[index] <= din; 

….

Ofcourse, there are 6 access paths for each of the counter set. Since this module has lots of flops (3*12*512 = 18K), DC is taking more than 4 hours to synthesize this module.

Solutions to Improve DC runtime

  • Do not use "flatten" – This improves a time little bit but not by big amount. In my case, by removing "flatten" command, I improved the time by 2%. Please note, timing of the module is affected by this command and for high performance and small logic, I recommend to keep this command.
  • Try to replace Registers with embedded RAMs. It improves runtime by 50% easily. But in my case, I could not replace registers completely.
  • Reduce options for DC to look for. What I mean here is that run time is directly proportional to number of possible arrangements. So for 512 registers in a set, DC will have to look for O(512) in a set and then pick an optimal arrangement. By reducing the options, runtime can be improved but ofcourse, timing optimization is adversly affected. (Less possible combinations to look for.) How to reduce number of combinations?? Look below

Straight way to reduce combinations is to perform logic partition explicitly. For our example, there is array of 512 registers which are accessed by two index variables. We can simply divide 512 registers in 16 groups, each group accessible by index0.  Each group holds 32 registers and is indexed by index1. Declaration for this scenario will look like,

reg [11:0] A0[0:31] ;

reg [11:0] A1[0:31] ;

reg [11:0] A2[0:31] ;

reg [11:0] A3[0:31] ;

reg [11:0] A4[0:31] ;

reg [11:0] A5[0:31] ;

reg [11:0] A6[0:31] ;

reg [11:0] A7[0:31] ;

reg [11:0] A8[0:31] ;

reg [11:0] A9[0:31] ;

reg [11:0] A10[0:31] ;

reg [11:0] A11[0:31] ;

reg [11:0] A12[0:31] ;

reg [11:0] A13[0:31] ;

reg [11:0] A14[0:31] ;

reg [11:0] A15[0:31] ;

By Following this methodology, I am able to reduce runtime to little over 70 minutes.

Update: Currently, I have applied above three in my design and able to reduce synthesis time to ~39 minutes. I am still experimenting and will update if anything changes.

Meanwhile, suggestions are welcome. 

High Level Synthesis Tools: A Designer’s Perspective

In Designs: A HLS Perspective, Design requirements are presented and categorized based on High level Synthesis tool perspective. How those requirements are met, was not discussed in the article.

To meet the requirements outlined in Designs: A HLS Perspective, high level synthesis tools have to employ few techniques. These techniques may be employed in a manner that these  are transparent to designers or may create usability issues and add to learning curve.

For example, A command or extension need to be added to existing language like ANSI-C to determine the width of data path. Now it can be achieved in following different ways

  • Method A: A ‘special’ type keyword added to ANSI C. Existing ANSI-C Code is modified for each data type.
  • Method B: A ‘special’ type keyword added to ANSI C. Existing ANSI-C Code modification is required for only input & output data types. All other internal data types are automatically extracted.
  • Method C: A constraint command is defined which can be added to ANSI C Code ‘inline style’ or can be specified in constraints file. All other internal data types are automatically extracted.

Method ‘C’ doesn’t necessiate need of two separate files, one as golden reference and other as ‘hardware’ version and according to me, Method C is the best option. What do you think? Please vote.

{pollxtbot id=2}

{pollxtresultbot id=2}

Similar to data path synthesis, special timing constructs are needed to implement bus interface in ANSI C. Issue is that once these constructs are available, design style becomes like RTL coding style. Even if tool may not require usage of these constructs to implement algorithmic portion of application, designers will use it anyway. To force designer to think at higher level of abstraction, these option should not be part of language at all. This creates further sub catagory in High Level Synthesis tools, called Algorithm Synthesis tools. Such high level synthesis tool will not be able to handle interface/bus protocol effectively but is adept at handling algorithms. I think it is okay as long as bus protocols are standards driven.

In Ranking System, pure ANSI C gets highest mark and added constructs/limitations will cost. On other hand, pure ANSI C based tool will not be able to handle bus protocols. Reality is that no tool is going to achieve perfect score. But any tool with ease of use will win eventually and remember QoR is the key!

Designs: A HLS Perspective

As mentioned in HLS Tools Benchmarking System, there is an effort is underway on SVTechie.com to develop a hypothetical High Level Synthesis Ranking System.

High-level synthesis of digital systems from a behavioral description has received significant attention in the last 15years. However, commercial synthesis tools have gained limited acceptance among designers, primarily due to poorsynthesis results in the presence of conditionals and especially loops, and lack of controllability of quality of results.

To measure High Level Synthesis tool effectiveness, design requirements, as well as various performance parameters needs to be defined for High Level Synthesis tool. First, design expectations are presented below. Next, performance parameters will be defined and presented.

In simplistic view, High Level Synthesis tool should be able to handle full digital portion of SoC and with good QoR. But this simplistic expectation is flawed and is equivalent of expecting RTL/ASIC design methodology to handle analog designs. For proper characterization, Digital Design Requirements must be categorized as followed.

  • Control Intensive Design

Control portion can be characterize by decision making block in the design. High level synthesis tools, generally have problem in generating high performance control circuit. This is because control-dominated circuits are very sensitive to clock cycles and this puts added pressure on scheduling algorithm in High Level Synthesis tools. Bus protocols are extreme form of control dominated circuit, where a part of logic has to communicate within a clock cycle. On other hand, FSMs (Control State Machines) are more flexible (in scheduling sense) compared to bus protocols. Any high level synthesis tool must be able to FSMs embedded in the designed. Bus Protocols handling is little tough and is categorized separately.

  • Bus Protocols

As mentioned before, Bus Protocols are difficult to synthesize in High Level Synthesis domain because of scheduling sensitivity of design. Also, special constructs are needed to describe time sensitive handshake protocol effectively. However, addition of these construct in existing high level language results in added complexity, large learning ramp and programming issues.

  • Data Path Intensive Design

This is relatively straight forward section of design from High Level Synthesis perspective. Though effective resource sharing is required to achieve good QoR. How variable types are specified and extracted, also impacts performance and may create usability issue.

  • Interface Synthesis

Interface serves as communication channel between two algorithmic sections in a design. It comprises of data path as well as some control signals to facilitate data transfer. Tools ability to automatically extract and generate appropriate interface may be good feature.

  • External IP Interface

Tools should have proper interface to allow external IPs to be integrated into the design.

  • Facilitate ECOs

Tools should be able to support/allow local changes in the design flow without requiring to go through the design flow.

Next are performance parameters for High Level Synthesis tools. Stay tuned!

Logic Syntheis Overview

vlsiflowjpg

Logic synthesis is the process of converting a high-level description of design into an optimized gate-level representation. High level description is represented using HDL Languages (Verilog/VHDL) in Register Transfer Level (RTL) form. This article presents quick overview of Synthesis technology.

This article covers definition of synthesis, logic synthesis and gives overview of ASIC design flow. After Synthesis tools are explained. Lastly, a example script is presented.

Please note: Most of the article refers to Design Compiler as a synthesis tool but discussion can be extended to any tool. Undoubtedly, Design compiler is most widely used synthesis tool, SVTechie.com does not endorse any product, though DC will be used as tool example throughout the article.

Continue reading “Logic Syntheis Overview”

C-Language techniques for FPGA acceleration of embedded software

In recent years, FPGA-based programmable platforms have emerged as viable alternatives for many types of high-performance computing applications. The opportunities presented by these platforms include the rapid creation of custom hardware, simplified field updates and the reduction or elimination of custom chips from many categories of electronic products.

This paper has described the fundamentals of FPGA-based platforms, and how C-language programming techniques can be applied effectively to these highly parallel platforms.

Full article can be read here C-Language techniques for FPGA acceleration of embedded software by David Pellerin (ImpulseC) and Kunal Shenoy (Xilinx)