Pipelining Hazards

**Structural** – Hazards that occur due to competition for the same resource (register file read vs. write back, instruction fetch vs. data read). Caching and clever register timing can solve these hazards.

**Control** – Hazards that occur due to non-sequential instructions (jumps and branches). These cannot be solved completely by forwarding, so we’re forced to introduce a branch-delay slot (MIPS) or use branch prediction.

**Data** – Hazards that occur due to data dependencies (instruction requires result from earlier instruction). These are mostly solved by forwarding, but `lw` still requires a bubble.

1. Suppose you’ve designed a MIPS processor implementation in which the stages take the following lengths of time: IF=20ns, ID=10ns, EX=20ns, MEM=35ns, WB=10ns. What is the minimum clock period for which your processor functions properly? Where should the bulk of your R&D budget go for the next generation of processors?

2. Your friend tells you that his processor design is 10x better than yours, since it has 50 pipeline stages to your 5. Is he right? Why or why not? (This is intentionally vague)

3. Spot all data dependencies (including ones that do not lead to stalls). Draw arrows from the stages where data is made available, directed to where it is needed. Circle the involved registers in the instructions. Assume no forwarding.

   ![Graph of instructions with time labels: addi $t0 $t1 100, lw $t2 4($t0), add $t3 $t1 $t2, sw $t3 8($t0), lw $t5 0($t6), or $t5 $t0 $t3]
4. Redraw the arrows for the above question assuming that our hardware provides forwarding.

\[ \text{time} \rightarrow \]

\[
\begin{align*}
\text{addi} & \quad \text{t0} & \quad \text{t1} & \quad 100 & \quad F & \quad D & \quad A & \quad M & \quad W \\
\text{lw} & \quad \text{t2} & \quad 4(&\text{t0}) & \quad F & \quad D & \quad A & \quad M & \quad W \\
\text{add} & \quad \text{t3} & \quad \text{t1} & \quad \text{t2} & \quad F & \quad D & \quad A & \quad M & \quad W \\
\text{sw} & \quad \text{t3} & \quad 8(&\text{t0}) & \quad F & \quad D & \quad A & \quad M & \quad W \\
\text{lw} & \quad \text{t5} & \quad 0(&\text{t6}) & \quad F & \quad D & \quad A & \quad M & \quad W \\
\text{or} & \quad \text{t5} & \quad \text{t0} & \quad \text{t3} & \quad F & \quad D & \quad A & \quad M & \quad W \\
\end{align*}
\]

5. How many stalls will we have to add to the pipeline to resolve the hazards in Exercise 3? How many stalls to resolve the hazards in Exercise 4?

6. Rewrite the following delayed branch MIPS excerpt to maximize performance, assuming forwarding.

\[
\begin{align*}
\text{Loop:} & \quad \text{addi} \; & \quad \text{v0} & \quad \text{v0} & \quad 1 \\
& \quad \text{addi} \; & \quad \text{t1} & \quad \text{a0} & \quad 4 \\
& \quad \text{lw} \; & \quad \text{t0} & \quad 0(&\text{t1}) \\
& \quad \text{add} \; & \quad \text{a0} & \quad \text{t0} & \quad \text{a1} \\
& \quad \text{addi} \; & \quad \text{a0} & \quad \text{a0} & \quad 4 \\
& \quad \text{bne} \; & \quad \text{t0} & \quad 0, \text{Loop} \\
& \quad \text{nop} \\
& \quad \text{jr} \; & \quad \text{ra} \\
\end{align*}
\]

7) Now, assume for the delayed branch code from Exercise 6 that our hardware can execute Static Dual Issue for any two instructions at once. Using reordering (with nops for padding), but no loop unrolling, schedule the instructions to make the loop take as few clock cycles as possible.