Design Compiler, Registers & Synthesis Runtime

Recently, I was working on a (not so big) module which required use of big registers. While synthesizing this module, DC was taking more than 4 hours of time on certain machine and it was becoming very frustrating. Finally I gave up and decided to investigate and address this issue. Following is the experiment information and possible fixes.

Hypothetical Requirements

3 sets of counters are needed to maintain status of various channels (512 channels). Each counter is 12 bit wide and at any given event, 6 counters from a set of counters may be accessed. In verilog, these counters can be declared as followed

reg [11:0]   A[0:511];

reg [11:0]   B[0:511];

reg [11:0]   C[0:511];

There is a combinatorial logic reading these counters by providing two index variables – one is 4 bit wide (16) and another is 5 bit wide (32). Index is calculated by concatenating these two variable (16*32 = 512). Read and write code may look like as followed

input reg [3:0] index0;

input reg [4:0] index1;

output wire [11:0] dout;

input wire [11:0] din;

wire [8:0] index = {index0, index1}; 

assign dout = A[index];

….

A[index] <= din; 

….

Ofcourse, there are 6 access paths for each of the counter set. Since this module has lots of flops (3*12*512 = 18K), DC is taking more than 4 hours to synthesize this module.

Solutions to Improve DC runtime

  • Do not use "flatten" – This improves a time little bit but not by big amount. In my case, by removing "flatten" command, I improved the time by 2%. Please note, timing of the module is affected by this command and for high performance and small logic, I recommend to keep this command.
  • Try to replace Registers with embedded RAMs. It improves runtime by 50% easily. But in my case, I could not replace registers completely.
  • Reduce options for DC to look for. What I mean here is that run time is directly proportional to number of possible arrangements. So for 512 registers in a set, DC will have to look for O(512) in a set and then pick an optimal arrangement. By reducing the options, runtime can be improved but ofcourse, timing optimization is adversly affected. (Less possible combinations to look for.) How to reduce number of combinations?? Look below

Straight way to reduce combinations is to perform logic partition explicitly. For our example, there is array of 512 registers which are accessed by two index variables. We can simply divide 512 registers in 16 groups, each group accessible by index0.  Each group holds 32 registers and is indexed by index1. Declaration for this scenario will look like,

reg [11:0] A0[0:31] ;

reg [11:0] A1[0:31] ;

reg [11:0] A2[0:31] ;

reg [11:0] A3[0:31] ;

reg [11:0] A4[0:31] ;

reg [11:0] A5[0:31] ;

reg [11:0] A6[0:31] ;

reg [11:0] A7[0:31] ;

reg [11:0] A8[0:31] ;

reg [11:0] A9[0:31] ;

reg [11:0] A10[0:31] ;

reg [11:0] A11[0:31] ;

reg [11:0] A12[0:31] ;

reg [11:0] A13[0:31] ;

reg [11:0] A14[0:31] ;

reg [11:0] A15[0:31] ;

By Following this methodology, I am able to reduce runtime to little over 70 minutes.

Update: Currently, I have applied above three in my design and able to reduce synthesis time to ~39 minutes. I am still experimenting and will update if anything changes.

Meanwhile, suggestions are welcome. 

Leave a Reply

Your email address will not be published. Required fields are marked *