Title page for ETD etd-11082004-105730

Type of Document Master's Thesis
Author Maurer, Slade S
Author's Email Address slade@computer.org
URN etd-11082004-105730
Title An Evaluation of Multiple Branch Predictor and Trace Cache Advanced Fetch Unit Designs for Dynamically Scheduled Superscalar Processors
Degree Master of Science in Electrical Engineering (M.S.E.E.)
Department Electrical & Computer Engineering
Advisory Committee
Advisor Name Title
David Koppelman Committee Chair
Ahmed A. El-Amawy Committee Member
J. Ramanujam Committee Member
  • fetch unit
  • fill unit
  • YAGS
  • trace
  • trace cache
  • multiple branch predictor
  • computer architecture
  • dynamically scheduled
  • superscalar
  • processor
Date of Defense 2004-11-02
Availability unrestricted
Semiconductor feature size continues to decrease permitting superscalar microprocessors to continue to increase the number of functional units available for execution. As the instruction issue width increases beyond the five instruction average basic block size of integer programs, more than one basic block must be issued per cycle to continue to increase instructions per cycle (IPC) performance. Researchers have created methods of fetching instructions beyond the first taken branch to overcome the bottleneck created by the limitations of conventional single branch predictors.

We compare the performance of the multiple branch prediction (MBP) and trace cache (TC) fetch unit optimization methods. Multiple branch predictor fetch unit designs issue multiple basic blocks per cycle using a branch address cache and a multiple branch predictor. A trace cache uses the runtime instruction stream to create fixed length instruction traces that encapsulate multiple basic blocks.

The comparison is performed by using a SPARC v8 timing based simulator. We simulate both advanced fetch methods and execute benchmarks from the SPEC CPU2000 suite. The results of the simulations are compared and a detailed analysis of both microarchitectures is performed.

We find that both fetch unit designs provide a competitive IPC performance. As issue width is increased from an eight to sixteen way superscalar, the IPC performance improves implying that these fetch unit designs are able to take advantage of the wider issue widths.

The MBP can use a smaller L1 instruction cache size than the TC and yet achieve a similar IPC performance. Pre-arranged instructions provided by the TC allow the pipeline stages to be shortened in comparison to the MBP. The shorter pipeline significantly improves the IPC performance.

Prior trace cache research used two or more ports to the instruction cache to improve the chances of fetching a full basic block per cycle. This was at the expense of instruction cache line realignment complexity. Our results show good performance with a single instruction cache port.

We study an approximately equal cost implementation for the MBP and TC. Of the six benchmarks studied, the TC outperforms the MBP over four of the benchmarks.

  Filename       Size       Approximate Download Time (Hours:Minutes:Seconds) 
 28.8 Modem   56K Modem   ISDN (64 Kb)   ISDN (128 Kb)   Higher-speed Access 
  Maurer_thesis.pdf 426.25 Kb 00:01:58 00:01:00 00:00:53 00:00:26 00:00:02

Browse All Available ETDs by ( Author | Department )

If you have more questions or technical problems, please Contact LSU-ETD Support.