CS 152 Computer Architecture and Engineering

Lecture 5 - Pipelining II
(Branches, Exceptions)

John Wawrzynek
Electrical Engineering and Computer Sciences
University of California at Berkeley

http://www.eecs.berkeley.edu/~johnw
http://inst.eecs.berkeley.edu/~cs152
Last time in Lecture 4

- Pipelining increases clock frequency, while growing CPI more slowly, hence giving greater performance

- Pipelining of instructions is complicated by HAZARDS:
  - Structural hazards (two instructions want same hardware resource)
  - Data hazards (earlier instruction produces value needed by later instruction)
  - Control hazards (instruction changes control flow, e.g., branches or exceptions)

- Techniques to handle hazards:
  1) Interlock (hold newer instruction until older instructions drain out of pipeline and write back results)
  2) Bypass (transfer value from older instruction to newer instruction as soon as available somewhere in machine)
  3) Speculate (guess effect of earlier instruction)
Control Hazards

What do we need to calculate next PC?

- **For Jumps**
  - Opcode, PC and offset

- **For Jump Register**
  - Opcode, Register value

- **For Conditional Branches**
  - Opcode, Register (for condition), PC and offset

- **For all other instructions**
  - Opcode and PC (and have to know it’s not one of above)
### PC Calculation Bubbles

#### Resource Usage

<table>
<thead>
<tr>
<th></th>
<th>t0</th>
<th>t1</th>
<th>t2</th>
<th>t3</th>
<th>t4</th>
<th>t5</th>
<th>t6</th>
<th>t7</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>IF</strong></td>
<td>I₁</td>
<td>-</td>
<td>I₂</td>
<td>-</td>
<td>I₃</td>
<td>-</td>
<td>I₄</td>
<td>-</td>
</tr>
<tr>
<td><strong>ID</strong></td>
<td>I₁</td>
<td>-</td>
<td>I₂</td>
<td>-</td>
<td>I₃</td>
<td>-</td>
<td>I₄</td>
<td>-</td>
</tr>
<tr>
<td><strong>EX</strong></td>
<td>I₁</td>
<td>-</td>
<td>I₂</td>
<td>-</td>
<td>I₃</td>
<td>-</td>
<td>I₄</td>
<td>-</td>
</tr>
<tr>
<td><strong>MA</strong></td>
<td>I₁</td>
<td>-</td>
<td>I₂</td>
<td>-</td>
<td>I₃</td>
<td>-</td>
<td>I₄</td>
<td>-</td>
</tr>
<tr>
<td><strong>WB</strong></td>
<td>I₁</td>
<td>-</td>
<td>I₂</td>
<td>-</td>
<td>I₃</td>
<td>-</td>
<td>I₄</td>
<td>-</td>
</tr>
</tbody>
</table>

- $\Rightarrow$ pipeline bubble

---

(I₁) $x_1 \leftarrow x_0 + 10$

(I₂) $x_3 \leftarrow x_2 + 17$

(I₃)

(I₄)
Speculate next address is PC+4

A jump instruction kills (not stalls) the following instruction

I_1  096  ADD
I_2  100  J 304
I_3  104  ADD
I_4  304  ADD

I_1

How?
Pipelining Jumps

To kill a fetched instruction -- Insert a mux before IR

Any interaction between stall and jump?

IRSrc_D = Case opcode_D
J, JAL ⇒ bubble
... ⇒ IM

I_1 096 ADD
I_2 100 J 304
I_3 104 ADD
I_4 304 ADD

9/13/2016
Jump Pipeline Diagrams

\[\text{time} \quad t0 \quad t1 \quad t2 \quad t3 \quad t4 \quad t5 \quad t6 \quad t7 \quad \ldots \]

\[\text{IF} \quad I_1 \quad I_2 \quad I_3 \quad I_4 \quad I_5 \quad I_6 \quad I_7 \quad \ldots \]

\[\text{ID} \quad I_1 \quad I_2 \quad I_3 \quad I_4 \quad I_5 \quad I_6 \quad I_7 \quad \ldots \]

\[\text{EX} \quad I_1 \quad I_2 \quad I_3 \quad I_4 \quad I_5 \quad I_6 \quad I_7 \quad \ldots \]

\[\text{MA} \quad I_1 \quad I_2 \quad I_3 \quad I_4 \quad I_5 \quad I_6 \quad I_7 \quad \ldots \]

\[\text{WB} \quad I_1 \quad I_2 \quad I_3 \quad I_4 \quad I_5 \quad I_6 \quad I_7 \quad \ldots \]

\[\text{Resource Usage} \quad \Rightarrow \quad \text{pipeline bubble}\]
Pipelining Conditional Branches

Branch condition is not known until the execute stage

*what action should be taken in the decode stage?*
Pipelining Conditional Branches

If the branch is taken
- kill the two following instructions
- the instruction at the decode stage is not valid ⇒ *stall signal is not valid*

<table>
<thead>
<tr>
<th>Inst</th>
<th>ADDR</th>
<th>BEQ x1,x2 +200</th>
</tr>
</thead>
<tbody>
<tr>
<td>l1</td>
<td>096</td>
<td>ADD</td>
</tr>
<tr>
<td>l2</td>
<td>100</td>
<td></td>
</tr>
<tr>
<td>l3</td>
<td>104</td>
<td>ADD</td>
</tr>
<tr>
<td>l4</td>
<td>300</td>
<td>ADD</td>
</tr>
</tbody>
</table>
Pipelining Conditional Branches

If the branch is taken
- kill the two following instructions
- the instruction at the decode stage is not valid \( \Rightarrow \text{stall signal is not valid} \)

\[
\begin{align*}
I_1: & \quad 096 \quad \text{ADD} \\
I_2: & \quad 100 \quad \text{BEQ x1,x2 +200} \\
I_3: & \quad 104 \quad \text{ADD} \\
I_4: & \quad 300 \quad \text{ADD}
\end{align*}
\]
Branch Pipeline Diagrams
(resolved in execute stage)

\[\begin{array}{cccccccc}
\text{time} & t0 & t1 & t2 & t3 & t4 & t5 & t6 & t7 \\
\hline
(I_1) & 096: \text{ADD} & \text{IF}_1 & \text{ID}_1 & \text{EX}_1 & \text{MA}_1 & \text{WB}_1 \\
(I_2) & 100: \text{BEQ} & +200 & \text{IF}_2 & \text{ID}_2 & \text{EX}_2 & \text{MA}_2 & \text{WB}_2 \\
(I_3) & 104: \text{ADD} & \text{IF}_3 & & \text{ID}_3 & - & - \\
(I_4) & 108: & \text{IF}_4 & - & - & - & - \\
(I_5) & 300: \text{ADD} & \text{IF}_5 & \text{ID}_5 & \text{EX}_5 & \text{MA}_5 & \text{WB}_5 \\
\end{array}\]

Resource Usage

\[\begin{array}{cccccccc}
\text{IF} & I_1 & I_2 & I_3 & I_4 & I_5 \\
\text{ID} & I_1 & I_2 & I_3 & - & I_5 \\
\text{EX} & I_1 & I_2 & - & - & I_5 \\
\text{MA} & I_1 & I_2 & - & - & I_5 \\
\text{WB} & I_1 & I_2 & - & - & I_5 \\
\end{array}\]

- \Rightarrow \text{pipeline bubble}
Use simpler branches (e.g., only compare one reg against zero) with compare in decode stage

\[
\begin{array}{cccccccc}
time & t_0 & t_1 & t_2 & t_3 & t_4 & t_5 & t_6 & t_7 \\
(I_1) 096: ADD & \text{IF}_1 & \text{ID}_1 & \text{EX}_1 & \text{MA}_1 & \text{WB}_1 \\
(I_2) 100: BEQZ +200 & \text{IF}_2 & \text{ID}_2 & \text{EX}_2 & \text{MA}_2 & \text{WB}_2 \\
(I_3) 104: ADD & \text{IF}_3 & - & - & - & - \\
(I_4) 300: ADD & \text{IF}_4 & \text{ID}_4 & \text{EX}_4 & \text{MA}_4 & \text{WB}_4 \\
\end{array}
\]

Resource Usage

\[
\begin{array}{cccccccc}
\text{IF} & \text{ID} & \text{EX} & \text{MA} & \text{WB} \\
\text{time} & t_0 & t_1 & t_2 & t_3 & t_4 & t_5 & t_6 & t_7 \\
I_1 & I_2 & I_3 & I_4 & I_5 & - & - & - & - \\
- & I_1 & I_2 & - & I_4 & I_5 & - & - & - \\
I_1 & I_2 & - & I_4 & I_5 & - & - & - & - \\
- & I_1 & I_2 & - & I_4 & I_5 & - & - & - \\
\end{array}
\]

- \Rightarrow \text{pipeline bubble}
Branch Delay Slots
(expose control hazard to software)

- Change the ISA semantics so that the instruction that follows a jump or branch is always executed
  - gives compiler the flexibility to put in a useful instruction where normally a pipeline bubble would have resulted.

\[
\begin{array}{ccc}
I_1 & 096 & \text{ADD} \\
I_2 & 100 & \text{BEQZ r1, +200} \\
I_3 & 104 & \text{ADD} \\
I_4 & 300 & \text{ADD} \\
\end{array}
\]

- Delay slot instruction
- executed regardless of branch outcome
Branch Pipeline Diagrams
(branch delay slot)

\[
\begin{array}{cccccccc}
\text{time} & t0 & t1 & t2 & t3 & t4 & t5 & t6 & t7 \\
\hline
(I_1) & \text{096: ADD} & \text{IF}_1 & \text{ID}_1 & \text{EX}_1 & \text{MA}_1 & \text{WB}_1 \\
(I_2) & \text{100: BEQZ +200} & \text{IF}_2 & \text{ID}_2 & \text{EX}_2 & \text{MA}_2 & \text{WB}_2 \\
(I_3) & \text{104: ADD} & \text{IF}_3 & \text{ID}_3 & \text{EX}_3 & \text{MA}_3 & \text{WB}_3 \\
(I_4) & \text{300: ADD} & \text{IF}_4 & \text{ID}_4 & \text{EX}_4 & \text{MA}_4 & \text{WB}_4 \\
\end{array}
\]

Resource Usage

\[
\begin{array}{cccccccc}
\text{IF} & I_1 & I_2 & I_3 & I_4 \\
\text{ID} & I_1 & I_2 & I_3 & I_4 \\
\text{EX} & I_1 & I_2 & I_3 & I_4 \\
\text{MA} & I_1 & I_2 & I_3 & I_4 \\
\text{WB} & I_1 & I_2 & I_3 & I_4 \\
\end{array}
\]
Post-1990 RISC ISAs don’t have delay slots

- Encodes microarchitectural detail into ISA
  - C.f. IBM 650 drum layout

- Performance issues
  - E.g., I-cache miss on delay slot causes machine to wait, even if delay slot is a NOP

- Complicates more advanced microarchitectures
  - 30-stage pipeline with four-instruction-per-cycle issue

- Better branch prediction reduced need
Why an Instruction may not be dispatched every cycle (CPI>1)

- Full bypassing may be too expensive to implement
  - typically all frequently used paths are provided
  - some infrequently used bypass paths may increase cycle time and counteract the benefit of reducing CPI

- Loads have two-cycle latency
  - Instruction after load cannot use load result
  - MIPS-I ISA defined *load delay slots*, a software-visible pipeline hazard (compiler schedules independent instruction or inserts NOP to avoid hazard). Removed in MIPS-II (pipeline interlocks added in hardware)
    - MIPS:“Microprocessor without Interlocked Pipeline Stages”

- Conditional branches may cause bubbles
  - kill following instruction(s) if no delay slots

*Machines with software-visible delay slots may execute significant number of NOP instructions inserted by the compiler. NOPs increase instructions/program!*
# RISC-V Branches and Jumps

Each instruction fetch depends on one or two pieces of information from the preceding instruction:

1) Is the preceding instruction a taken branch?

2) If so, what is the target address?

<table>
<thead>
<tr>
<th>Instruction</th>
<th>Taken known?</th>
<th>Target known?</th>
</tr>
</thead>
<tbody>
<tr>
<td>J</td>
<td>After Inst. Decode</td>
<td>After Inst. Decode</td>
</tr>
<tr>
<td>JR</td>
<td>After Inst. Decode</td>
<td>After Reg. Fetch</td>
</tr>
<tr>
<td>B&lt;cond.&gt;</td>
<td>After Execute</td>
<td>After Inst. Decode</td>
</tr>
</tbody>
</table>
Branch Penalties in Modern Pipelines

UltraSPARC-III instruction fetch pipeline stages
(in-order issue, 4-way superscalar, 750MHz, 2000)

- A: PC Generation/Mux
- P: Instruction Fetch Stage 1
- F: Instruction Fetch Stage 2
- B: Branch Address Calc/Begin Decode
- I: Complete Decode
- J: Steer Instructions to Functional units
- R: Register File Read
- E: Integer Execute

Branch Target Address Known
Branch Direction & Jump Register Target Known

Remainder of execute pipeline (another 6 stages)
Reducing Control Flow Penalty

- **Software solutions**
  - Eliminate branches - loop unrolling
    - Increases the run length
  - Reduce resolution time - instruction scheduling
    - Compute the branch condition as early as possible (of limited value because branches often in critical path through code)

- **Hardware solutions**
  - Find something else to do - delay slots
    - Replaces pipeline bubbles with useful work (requires software cooperation)
  - Speculate - branch prediction
    - Speculative execution of instructions beyond the branch
Branch Prediction

Motivation:
Branch penalties limit performance of deeply pipelined processors
Modern branch predictors have high accuracy (>95%) and can reduce branch penalties significantly

Required hardware support:
Prediction structures:
• Branch history tables, branch target buffers, etc.

Mispredict recovery mechanisms:
• Keep result computation separate from commit
• Kill instructions following branch in pipeline
• Restore state to that following branch
Static Branch Prediction

Overall probability a branch is taken is ~60-70% but:

- **backward** 90%
- **forward** 50%

ISA can attach preferred direction semantics to branches, e.g., Motorola MC88110
  - `bne0 (preferred taken)`  
  - `beq0 (not taken)`

ISA can allow arbitrary choice of statically predicted direction, e.g., HP PA-RISC, Intel IA-64, MIPS (BEQL, branch on equal likely) typically reported as ~80% accurate
Dynamic Branch Prediction
learning based on past behavior

- **Temporal correlation**
  - The way a branch resolves may be a good predictor of the way it will resolve at the next execution

- **Spatial correlation**
  - Several branches may resolve in a highly correlated manner (a preferred path of execution)
Branch Prediction Bits

• Finite state machine (FSM) used to store “history” of a particular branch instruction.
  • Use current state to predict branch, then update state based on actual branch outcome

• Common is 2 BP bits per instruction $\Rightarrow$ 4 state FSM

• Change the prediction after two consecutive mistakes:
Branch History Table (BHT)

4K-entry BHT, 2 bits/entry, ~80-90% correct predictions
Exploiting Spatial Correlation

*Yeh and Patt, 1992*

```plaintext
if (x[i] < 7) then
    y += 1;
if (x[i] < 5) then
    c -= 4;
```

If first condition false, second condition probably also false

*History register, H,* records the direction of the last N branches executed by the processor.
Two-Level Branch Predictor

*Pentium Pro uses the result from the last two branches to select one of the four sets of BHT bits (~95% correct)*
Speculating Both Directions

- An alternative to branch prediction is to execute both directions of a branch speculatively
  - execute down both paths until branch is resolved (delaying commits)
  - what if branch follows another branch, ...
  - resource requirement is proportional to the number of concurrent speculative executions
  - only half the resources engage in useful work when both directions of a branch are executed speculatively
  - branch prediction takes less resources than speculative execution of both paths

- With accurate branch prediction, it is more cost effective to dedicate all resources to the predicted direction!
Limitations of BHTs

Only predicts branch direction. Therefore, cannot redirect fetch stream until after branch target is determined.

UltraSPARC-III fetch pipeline
CS152 Administrivia

- PS1 now due Thursday next week instead of Today.

- Quiz 1 next week on Tue Sep 20 will cover PS1, Lab1, lectures 1-5, and associated readings.
BP bits are stored with the predicted target address.

IF stage: If (BP=taken) then nPC=target else nPC=PC+4
Later: check prediction, if wrong then kill the instruction and update BTB & BPb else update BPb
Address Collisions

Assume a 128-entry BTB

What will be fetched after the instruction at 1028?

BTB prediction = 236
Correct target = 1032

⇒ kill PC=236 and fetch PC=1032

Is this a common occurrence?
Can we avoid these bubbles?
BTB is only for Control Instructions

- BTB contains useful information for branch and jump instructions only
  
  \[\Rightarrow\text{ Do not update it for other instructions}\]

- For all other instructions the next PC is PC+4!

- *How to achieve this effect without decoding the instruction?*
Branch Target Buffer (BTB)

- Keep both the branch PC and target PC in the BTB
- PC+4 is fetched if match fails
- Only *taken* branches and jumps held in BTB
- Next PC determined *before* branch fetched and decoded
Combining BTB and BHT

- BTB entries are considerably more expensive than BHT, but can redirect fetches at earlier stage in pipeline and can accelerate indirect branches (JR)
- BHT can hold many more entries and is more accurate

<table>
<thead>
<tr>
<th>Step</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>A</td>
<td>PC Generation/Mux</td>
</tr>
<tr>
<td>B</td>
<td>Branch Address Calc/Begin Decode</td>
</tr>
<tr>
<td>F</td>
<td>Instruction Fetch Stage 1</td>
</tr>
<tr>
<td>P</td>
<td>Instruction Fetch Stage 2</td>
</tr>
<tr>
<td>I</td>
<td>Complete Decode</td>
</tr>
<tr>
<td>J</td>
<td>Steer Instructions to Functional units</td>
</tr>
<tr>
<td>R</td>
<td>Register File Read</td>
</tr>
<tr>
<td>E</td>
<td>Integer Execute</td>
</tr>
</tbody>
</table>

\[
\text{BHT in later pipeline stage corrects when BTB misses a predicted taken branch}
\]

\[
\text{BTB/BHT only updated after branch resolves in E stage}
\]
Uses of Jump Register (JR)

- Switch statements (jump to address of matching case)
  - BTB works well if same case used repeatedly

- Dynamic function call (jump to run-time function address)
  - BTB works well if same function usually called, (e.g., in C++ programming, when objects have same type in virtual function call)

- Subroutine returns (jump to return address)
  - BTB works well if usually return to the same place
    \[ \Rightarrow \text{Often one function called from many distinct call sites!} \]

How well does BTB work for each of these cases?
Subroutine Return Stack

Small structure to accelerate JR for subroutine returns, typically much more accurate than BTBs. Use instead of BTB for returns.

\[
\begin{align*}
fa() & \{ \ fb(); \quad \} \\
fb() & \{ \ fc(); \quad \} \\
f\!c() & \{ \ fd(); \quad \}
\end{align*}
\]

Push call address when function call executed

Pop return address when subroutine return decoded

\[
\begin{array}{c}
&fd() \\
&fc() \\
&fb()
\end{array}
\]

k entries
(typically k=8-16)
Interrupts:
altering the normal flow of control

An external or internal event that needs to be processed by another (system) program. The event is usually unexpected or rare from program’s point of view.
Causes of Interrupts

Interrupt: an *event* that requests the attention of the processor

- **Asynchronous:** an *external event*
  - input/output device service-request
  - timer expiration
  - power disruptions, hardware failure

- **Synchronous:** an *internal event (a.k.a. traps or exceptions)*
  - undefined opcode, privileged instruction
  - arithmetic overflow, FPU exception
  - misaligned memory access
  - *virtual memory exceptions:* page faults, TLB misses, protection violations
  - system calls, e.g., jumps into kernel
History of Exception Handling

- First system with exceptions was Univac-I, 1951
  - Arithmetic overflow would either
    - 1. trigger the execution of a two-instruction fix-up routine at address 0, or
    - 2. at the programmer's option, cause the computer to stop
  - Later Univac 1103, 1955, modified to add external interrupts
    - Used to gather real-time wind tunnel data

- First system with I/O interrupts was DYSEAC, 1954
  - Had two program counters, and I/O signal caused switch between two PCs
  - Also, first system with DMA (direct memory access by I/O device)

[Courtesy Mark Smotherman]
DYSEAC, first mobile computer!

- Carried in two tractor trailers, 12 tons + 8 tons
- Built for US Army Signal Corps

[Courtesy Mark Smotherman]
Asynchronous Interrupts:
invoking the interrupt handler

- An I/O device requests attention by asserting one of the \textit{prioritized interrupt request lines}

- When the processor decides to process the interrupt
  - It stops the current program at instruction $l_i$, completing all the instructions up to $l_{i-1}$ \textit{(precise interrupt)}
  - It saves the PC of instruction $l_i$ in a special register (EPC)
  - It disables interrupts and transfers control to a designated interrupt handler running in kernel mode
Interrupt Handler

- Saves EPC before enabling interrupts to allow nested interrupts ⇒
  - need an instruction to move EPC into GPRs
  - need a way to mask further interrupts at least until EPC can be saved
- Needs to read a status register that indicates the cause of the interrupt
- Uses a special indirect jump instruction RFE (return-from-exception) which
  - enables interrupts
  - restores the processor to the user mode
  - restores hardware status and control state
Synchronous Interrupts

- A synchronous interrupt (exception) is caused by a particular instruction

- In general, the instruction cannot be completed and needs to be restarted after the exception has been handled
  - requires undoing the effect of one or more partially executed instructions

- In the case of a system call trap, the instruction is considered to have been completed
  - a special jump instruction involving a change to privileged kernel mode
Exception Handling 5-Stage Pipeline

- How to handle multiple simultaneous exceptions in different pipeline stages?
- How and where to handle external asynchronous interrupts?
Exception Handling 5-Stage Pipeline

- PC
- Inst. Mem
- Decode
- Data Mem
- W

PC address Exception
Illegal Opcode
Overflow
Data address Exceptions
Asynchronous Interrupts

Select Handler PC
Kill F Stage
Kill D Stage
Kill E Stage

EPC
Writeback
Commit Point
Cause
Exception Handling 5-Stage Pipeline

- Hold exception flags in pipeline until commit point (M stage)

- Exceptions in earlier pipe stages override later exceptions for a given instruction

- Inject external interrupts at commit point (override others)

- If exception at commit: update Cause and EPC registers, kill all stages, inject handler PC into fetch stage
Summary – Handling Exceptions

- Check prediction mechanism
  - Exceptions detected at end of instruction execution pipeline, special hardware for various exception types

- Recovery mechanism
  - Only write architectural state at commit point, so can throw away partially executed instructions after exception
  - Launch exception handler after flushing pipeline

- Bypassing allows use of uncommitted instruction results by following instructions
## Exception Pipeline Diagram

### Time

<table>
<thead>
<tr>
<th></th>
<th>t0</th>
<th>t1</th>
<th>t2</th>
<th>t3</th>
<th>t4</th>
<th>t5</th>
<th>t6</th>
<th>t7</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>IF</td>
<td></td>
<td>IF1</td>
<td>IF2</td>
<td>IF3</td>
<td>IF4</td>
<td>IF5</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ID</td>
<td></td>
<td>ID1</td>
<td>ID2</td>
<td>ID3</td>
<td></td>
<td>ID5</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>EX</td>
<td></td>
<td>EX1</td>
<td>EX2</td>
<td></td>
<td>EX5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>MA</td>
<td></td>
<td>MA1</td>
<td></td>
<td>MA5</td>
<td></td>
<td>MA5</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>WB</td>
<td></td>
<td></td>
<td>WB5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

### Resource Usage

<table>
<thead>
<tr>
<th></th>
<th>t0</th>
<th>t1</th>
<th>t2</th>
<th>t3</th>
<th>t4</th>
<th>t5</th>
<th>t6</th>
<th>t7</th>
<th>...</th>
</tr>
</thead>
<tbody>
<tr>
<td>IF</td>
<td>I1</td>
<td>I2</td>
<td>I3</td>
<td>I4</td>
<td>I5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>ID</td>
<td>I1</td>
<td>I2</td>
<td>I3</td>
<td></td>
<td>I5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>EX</td>
<td>I1</td>
<td>I2</td>
<td></td>
<td></td>
<td>I5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>MA</td>
<td>I1</td>
<td></td>
<td></td>
<td></td>
<td>I5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>WB</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>I5</td>
</tr>
</tbody>
</table>
Acknowledgements

- These slides contain material developed and copyright by:
  - Arvind (MIT)
  - Krste Asanovic (MIT/UCB)
  - Joel Emer (Intel/MIT)
  - James Hoe (CMU)
  - John Kubiatowicz (UCB)
  - David Patterson (UCB)

- MIT material derived from course 6.823
- UCB material derived from course CS252