Branch Prediction
Overview
The purpose of this assignment is to experiment with different methods of implementing branch prediction. This exercise will have you design a program that simulates branch prediction schemes as described below and collect performance information to analyze and compare the effectiveness of those schemes. To do this, you’ll write a program that reads in a branch trace file and models different branch prediction schemes and different predictor sizes. You can write the program in the language of your choice.
Although you are going to turn in a printout of your source program, the primary focus of this assignment is the analysis and presentation of your experimental results.
Trace File and Trace Format
The trace you will use is a trace of 16+ million conditional branches. These are the conditional branches from an execution of the program GCC (Gnu C Compiler) from the SPECint2000 benchmark suite. Since unconditional branches are always taken, they are excluded from the
trace file so only conditional branches are included. Each line of the trace file has two fields. Below are the first four lines of the trace file:
3086629576 T
3086629604 T
3086629599 N
3086629604 T
The first field is the address of the branch instruction (stated as a decimal number) which consists of either 9 or 10 digits. The second field is the character “T” or “N” that specifies whether the branch was taken or not taken upon execution. A single space character (0x20) separates the two fields. Each line of the file terminates with a single newline character (0xa). The trace file is available on Blackboard as an 8.6MB zipped compressed file (branch-trace-gcc.zip). This file can be downloaded and uncompressed using any utility that can extract files from a zipped folder. Some programming languages contain utilities that allow you to read the compressed file contents directly without first uncompressing it. C/C++ compilers have zlib and Java has java.util.zip to open and process the compressed file directly. You may use these if you are familiar with them and if those techniques might be easier for you. There are a total of 16,416,279 entries in this file.
Your first task is to write the part of your program that can read the file and interpret each of the two fields.
Part 1 – Static Branch Predictor
The first branch predictor to model is a static predictor. This will give you a baseline measure of branch predictions to compare with the dynamic method. The two policies that you should test are “always predict taken” and “always predict not taken”. As you process the trace file in your program, collect the information that will allow you to calculate the misprediction rate (the percentage of branches that were mispredicted) for these two simple schemes. With this information, answer the following questions.
Which of these two policies is more accurate (has fewer mispredictions)?
Based on what you know about common programming paradigms, what might explain the above result?
Part 2 – 2-Bit Bimodal Dynamic Branch Predictor (Figure 1)
The simplest dynamic branch predictor is a branch history table consisting of an array of 2n two-bit counters. Each counter includes one of four values as depicted in the following diagram:
To make a prediction, the predictor selects an entry from the table using the lower-order n bits of the instruction’s address (its program counter value). The direction of the prediction is made based on the value of the counter. Values 00 or 01 are predicted not taken; values 10 or 11 are predicted taken. The counter increments and decrements based on gray code counting. In gray code counting, only 1 bit changes between adjacent values.
After each branch (correctly predicted or not), the corresponding BHT entry is incremented or decremented to bias the counter toward the actual branch outcome (the outcome given in the trace file). As these are two bit saturating counters, decrementing a minimum counter or incrementing a maxed- out counter should not change its value. In other words, 00 is the lowest value and 11 is the maximum.
Although initialization doesn’t affect the results in any significant way, your code should initialize the predictor to “strongly not taken” (00).
Your program should be able to run multiple times varying the size of the branch history table on each run. Vary the size of the BHT from 128 to 16K entries. These sizes correspond to BHT index sizes of 7 to 14 bits.
Run your program to simulate varying sizes of predictors. Analyze the impact of predictor size on prediction accuracy. Generate a line plot of your results using MS Excel or some other graphing application. On the y-axis plot “percentage of branches mispredicted” (a metric in which smaller is better). Plot the log of the predictor size (the number of index bits) on the x-axis. By plotting the predictor size in terms of number of index bits, the x-axis becomes a log scale, which is what we want for this graph. Answer the following questions base on the data you collected:
Given a large enough predictor, what is the best misprediction rate obtainable by the bimodal predictor?
How large must the predictor be to reduce the number of mispredictions by approximately half as compared to the better of “always taken” and “always not taken”? Give the predictor size both in terms of number of counters as well as bytes.
At what point does the performance of the predictor pretty much max out or begin to decline? That is, how large does the predictor need to be before it basically captures almost all of the benefit of a much larger predictor.