From previous discussion, it is clear that allocations/deallocations (called queue command) throughput is limited by memory access bottleneck. Access requirement are summarized in table below.
Data Type Allocation Deallocation
Link Memory Read/Write Read/Write
FL Start Pointer Read/Write Write*
FL End Pointer Write* Read/Write
FL Size Read/Write Read/Write
PL Start Pointer Write* Read/Write
PL End Pointer Read/Write Write*
PL Size Read/Write Read/Write
As it is evident, there are total of 12 memory access per queue command. In a very simplistic model, only one command can be executed per 12 cycles. Architecture for this kind of system is presented below.
Please note that throughput for an ideal system is assumed where memory accesses are efficiently pipelined to achieve maximum performance.
By Implementing Free list in registers (Flip Flops), performance of system can be improved to 7 cycles per queue command. Further, implementation of Link Memory and Port List informations in separate memory, will result in performance improvement of 3 cycles per queue commands. Implementing each of the port list information in different memory, may improve performance to 2 cycles per command.
To achieve throughput of 1 clock per command, Link Memory should have dual port and all of Port List and Free List information must be implemented in Registers. Architecture of such system is presented in following figure.
As evident, we can not really achieve two commands per cycle because of bottleneck presented by Link Memory. But what happens in a scenario when 2 commands per cycle is a must requirement. Next, various tradeoffs are closely observed and different options are considered to improve performance.