

















## Historical Trivia • First MIPS design did not interlock and stall on load-use data hazard • Real reason for name behind MIPS: Microprocessor without Interlocked Pipeline Stages • Word Play on acronym for Millions of Instructions Per Second, also called MIPS

Garcia, Fall 2005 © UCB

CS61C L20 Introduction to Pipelined Execution, pt II (11)





















wired.com/news/technology/bugs/0,2924,69355,00.html

Garcia, Fall 2005 © UCB

on to Pipelined Execution, pt II (21)

CS61C L20 Int









rcia, Fall 2005 @ UCB



CS61C | 20

lined Execution, pt II (27)









| Pee                            | r Instruction (1/2)                                                                                                                      |                      |
|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------|----------------------|
|                                |                                                                                                                                          |                      |
| Assume<br>pipeline<br>load haz | 1 instr/clock, delayed branch, 5 stag<br>, forwarding, interlock on unresolved<br>, ards (after 10 <sup>3</sup> loops, so pipeline full) | e 1<br>2<br>3        |
| Loop:                          | <pre>lw \$t0, 0(\$s1) addu \$t0, \$t0, \$s2 sw \$t0, 0(\$s1) addiu \$s1, \$s1, -4 bne \$s1, \$zero, Loop nop</pre>                       | 4<br>5<br>7<br>8     |
| •How ma<br>loop iter           | any pipeline stages (clock cycles) per<br>ration to execute this code?                                                                   | 9<br>10<br>205 © UCB |

| Pee                                                                               | r Instruction (2/2)                                                                                                                                                                                                     |                            |
|-----------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|
|                                                                                   |                                                                                                                                                                                                                         |                            |
| Assume<br>pipeline,<br>load haz<br>Rewrite<br>(clock cy                           | 1 instr/clock, delayed branch, 5 stage<br>forwarding, interlock on unresolved<br>ards (after 10 <sup>3</sup> loops, so pipeline full).<br>this code to reduce pipeline stages<br>(cles) per loop to as few as possible. | 1 2 2                      |
| Loop:                                                                             | <pre>lw \$t0, 0(\$s1) addu \$t0, \$t0, \$s2 sw \$t0, 0(\$s1) addiu \$s1, \$s1, -4 bne \$s1, \$zero, Loop nop</pre>                                                                                                      | 3<br>4<br>5<br>6<br>7<br>8 |
| •How many pipeline stages (clock cycles) per loop iteration to execute this code? |                                                                                                                                                                                                                         | 9<br>10                    |

