This algorithm is very complicated but has very high performance. I have developed this algorithms based on other counting algorithms, mainly hardware version of Parallel Count algorithm. This algorithm is explained in brief here. Please post your question if you need more information. Basic operation in the algorithm is that number of bits in any two bit number can be counted by using simple half adder.
This algorithm performs following steps
- Group number in 2 bit number
- Determine number of ones in each of the pair using half adder. Since number of 1s can be 0, 1 or 2 in a two bit number, 2 bit storage is required to hold result.
- Result should have either 0 or 1 but not 2. If Result is 2 then this number has more than 1 bit ON & no further computation is required. Check if MSB of result is zero or not.
- If MSBs of each result pair are 0, discard result MSBs & create new number of half the original width & Repeat.
- Iteration will start with 32 bit input & output will be reduced to 16 bit. Above operations should be repeated till only 1 bit is left @ output.
assign sum15_top = (sum15_hi == 0);
assign sum8_top = (sum8_hi == 0);
assign sum4_top = (sum4_hi == 0);
assign sum2_top = (sum2_hi == 0);
assign sum1_top = (sum1_hi == 0);
always @(*) begin
// 8 Half adders
// 4 Half Adders
// 2 Half Adders
// 1 Half Adder
always @(posedge clk or negedge rst_n) begin
If implemented in ANSI-C, 5 distinct steps are performed, each step having a loop. Each of these loops are run for 32, 16, 8, 4, and 2 time. Based on this, Number of cycles taken arek*62 (Where k is arbitrary constant). HW performance is
- Code Size – 40 (Less the better)
- Complexity – 5/10 on simplicity scale. (More the better – Little Subjective)
- Basic Gate Count & Timing – Area is 762 micron2 (Less the better), Critical Path Delay 1.26ns(Less the better)
- Optimizable – Area is 1992 micron2(Less the better), Critical Path Delay 1.0ns (Less the better)