Let's not just transform those in need, we can also find ways to help transform those in power. -unknown
 

Main Menu

Home
Articles
SVTechie Blog
Links
Download
Discussion Forum
Photo Gallery
Quick Bites
FAQs

Login






Lost Password?
No account yet? Register

Statistics

We have 13 guests online

SVTechie Recommends


powered_by.png, 1 kB

Text Links


Home
Limits of Instruction Level Parallelism PDF Print E-mail
Written by SVTechie   
Tuesday, 02 May 2006

In High Level Synthesis world, effectiveness of any tool depends on  its ability to extract parallelism, inherent in the application. There has been lots of observation on parallelism, originating, mainly with Amdahl's Law. Amdahl's Law is very passimistic observation and similar empirical analysis is performed in Limits of Instruction Level Parallelism. This research paper is very old, written in 1993. However, findings are still applicable. Excerpts from Paper:

Growing interest in ambitious multiple-issue machines and heavilypipelined machines requires a careful examination of how much instructionlevelparallelism exists in typical programs. Such an examination is complicated by the wide variety of hardware and software techniques for increasingthe parallelism that can be exploited, including branch prediction, register renaming, and alias analysis. By performing simulations based on instructiontraces,  we can model techniques at the limits of feasibility and evenbeyond. This paper presents the results of simulations of 18 different testprograms under 375 different models of available parallelism analysis.

Bottleneck according to Amdahl, is because of strictly dependent/serial portion of any code. Considering a infinite parallel computer, minimum time to execute an application is dependent on strictly serial portion because this set of instruction can not be executed in parallel. According to Amdahl's Law, there is no tangible gain whatsoever for computer having more than 10 parallel cores, and in general, 3-4 parallel cores are enough.

However, Gustafson-Barsis' law looks at parallelism in different way. This is more optimistic view on parallelism. Accodring to this law,  parallelism scales well with increasing complexity. Or in other words, any sufficiently large problem can be efficiently parallelized.

Above observations are based on mainly spatial parallelism, inherent in application i.e. executing two pieces of code in parallel at different coordinates on a given computing plateform. There is another set of parallelism which can be exploited in time, called temporal parallelism. Main example is pipelining in computers, which runs three or four (some upto 10-12) instruction at a time but ingress to system is one instruction at a time and egress is one result at a time. Effective execution cycles for an instruction is increased but effective execution time for application is shortened i.e. both throughput and latancy are increased.

Both, spatial and temporal are still looked as different aspects in computing world but I believe that both spatial and temporal parallelism can be combined in effective manner to achieve high performance. Assuming that there is an infinite temporal/spatial parallel computer then theoritically we should be able to achieve throughput of a clock cycle for any complex algorithm (Except for true K-cyclic/cyclic portion of the code).

My conclusion: A virtual infinite temporal/spatial parallel computer can be built for given performance and given application.

Last Updated ( Tuesday, 02 May 2006 )
 
< Prev   Next >
Free Ringtones | Neopets Cheats, Games and Neopoints | Myspace Layouts | Cell Phones | Home Loan
© 2008 SVTechie :: Online Resources For Techies BY Techies
Joomla! is Free Software released under the GNU/GPL License.