## EECS150 - Digital Design

Lecture 12 - Project Introduction

### Part 1

Feb 25, 2010 John Wawrzynek

Spring 2010

EECS150 - Lec12-proj1

Page 1

## **Project Overview**

- A. MIPS150 pipeline structure
- B. Serial Interface
- C. Memories, project memories and FPGAs
- D. Video subsystem
- E. Ethernet Interface
- F. Project specification and grading standard

### MIPS 5-stage Pipeline Review CLK CK ALUOutW CLK CĻK CLK CLK WE3 WF A1 RD1 nstrD RD adData LUOut RD А Instruction A2 RD2 Data Memory A3 Memory WD3 Register File /riteDataN WD 0 WriteReaW Write RdE Sign Extend PCPlus4D Result\ Decode (ID) Execute (EX) Memory(DM) Writeback (WB) Fetch (IF) Use PC register Generate control Use ALU to Read or write Send result as address to signals, retrieve compute result, data memory back to instruction register values memory address, (DMEM). regfile. from regfile. or compare memory (IMEM) and retrieve next registers. instruction. Spring 2010 EECS150 - Lec12-proj1 Page 3

### **MIPS 5-stage Pipeline**

### Control Hazard Example

| cycle →               |    |      |      |    |    |    |    |  |
|-----------------------|----|------|------|----|----|----|----|--|
| beq \$1, \$2, L1      | IF | ID 🕇 | EX 🛉 | DM | WB |    |    |  |
| add \$5, \$3, \$4     | 1  | IF   | ID   | EX | DM | WB |    |  |
| L1: sub \$5, \$3, \$4 |    |      | IF   | ID | EX | DM | WB |  |
|                       |    |      |      | •  |    |    | •  |  |

but needed here! / branch address ready here

Register values are known here, move branch compare and target address generation to here.

Still one remaining cycle of branch delay. "Architected branch delay slot" on MIPS allows compiler to deal with the delay. Other processors without architected branch-delay slot use branch predictors or pipeline stalling.

### **MIPS 5-stage Pipeline**

### Data Hazard Example

| add \$5, \$3, \$4                                 | IF | ID | EX ↑ | DM   | WB | <b>↑</b> |  |
|---------------------------------------------------|----|----|------|------|----|----------|--|
| add \$7, \$6, \$5                                 |    | IF | , ID | ↓ EX | DM | WB       |  |
| reg 5 value needed here! Reg 5 value updated here |    |    |      |      |    |          |  |

New value is actually known here. Send it directly from the output register of the the ALU to its input (and also down the pipeline to the register file).

Logic must be added to detect when such a hazard exists and control multiplexors to forward correct value to ALU. No alternative except to stall pipeline (thus hurting performance).

Spring 2010

EECS150 - Lec12-proj1

Page 5

### **MIPS 5-stage Pipeline**

### Load Hazard Example

| lw \$5, offset(\$4) IF                     | ID | EX | DM 🛉 | WB |    |    |  |
|--------------------------------------------|----|----|------|----|----|----|--|
| add \$ <b>7,0\$6,9\$\$8</b>                | IF | ID | F EX | DM | WB |    |  |
| ∫add <b>\$1,0</b> \$ <b>\$</b> 9\$\$8      |    | IF | ID   | EX | DM | WB |  |
| value needed here! Memory value known here |    |    |      |    |    |    |  |

"Architected load delay slot" on MIPS allows compiler to deal with the delay. Note, regfile still needs to be bypassed.

No other alternative except for stalling.

### **Processor Pipelining**

### Deeper pipeline example.

| IF1 | IF2 | ID  | X1 | X2 | M1 | M2 | WB |    |  |
|-----|-----|-----|----|----|----|----|----|----|--|
|     | IF1 | IF2 | ID | X1 | X2 | M1 | M2 | WB |  |

Deeper pipelines => less logic per stage => high clock rate.

But Deeper pipelines => more hazards => more cost and/or higher CPI.

Cycles per instruction might go up because of unresolvable hazards.

Remember, Performance = # instructions X Frequency<sub>clk</sub> / CPI

How about shorter pipelines ... Less cost, less performance

Spring 2010

EECS150 - Lec12-proj1

Page 7

### **MIPS150** Pipeline

The blocks in the datapath with the greatest delay are: IMEM, ALU, and DMEM. Allocate one pipeline stage to each:



Most details you will need to work out for yourself. Some details to follow ... In particular, let's look at hazards.



Spring 2010

EECS150 - Lec12-proj1

Page 9





"Architected load delay slot" on MIPS allows compiler to deal with the delay. No regfile bypassing needed here assuming regfile "write before read".

# MIPS 3-stage Pipeline Data Hazard add \$5, \$3, \$4 I X M add \$7, \$6, \$5 I X M reg 5 value needed here!

### Ways to fix:

- 1. Stall the pipeline behind first add to wait for result to appear in register file. NOT ALLOWED this semester.
- 2. Selectively forward ALU result back to input of ALU.
- Need to add mux at input to ALU, add control logic to sense when to activate. A bit complex to design. Check book for details.

```
Spring 2010
```

```
EECS150 - Lec12-proj1
```

```
Page 11
```

# **Project CPU Pipelining Summary**

| 3-stage  | I           | х       | м           |  |
|----------|-------------|---------|-------------|--|
| pipeline | instruction | execute | access data |  |
|          | fetch       |         | memory      |  |

- Pipeline rules:
  - Writes/reads to/from DMem use leading edge of "M"
  - Writes to RegFile use trailing edge of "M"
  - Instruction Decode and Register File access is up to you.
- 1 Load Delay Slot, 1 Branch Delay Slot
  - No Stalling may be used to accommodate pipeline hazards (in final version).
- Other:
  - Target frequency to be announced later (50-100MHz)
  - Minimize cost
  - Posedge clocking only

Spring 2010

EECS150 - Lec12-proj1

### **Background for Lab Next Week**

Spring 2010

EECS150 - Lec12-proj1

Page 13

### Final Project: Spring 2010



- Executes most commonly used MIPS instructions.
- Pipelined (high performance) implementation.
- Serial console interface for shell interaction, debugging.
- Ethernet interface for high-speed file transfer.
- Video interface for display with 2-D vector graphics acceleration.
- Supported by a C language compiler.

Spring 2010 EECS150 lec01-intro

### **Board-level Physical Serial Port**





# MIPS uses Memory Mapped I/O

- Certain addresses are not regular memory
- Instead, they correspond to registers in I/O devices



# **Processor Checks Status before Acting**

- Path to device generally has 2 registers:
  - <u>Control Register</u>, says it's OK to read/write (I/O ready) [think of a flagman on a road]
  - Data Register, holds data for transfer
- Processor reads from Control Register in loop, waiting for device to set <u>Ready</u> bit in Control reg ( $0 \Rightarrow 1$ ) to say its OK
- Processor then loads from (input) or writes to (output) data register

# MIPS150 Serial Line Interface

- Serial-Line Interface is a memory-mapped device.
- Modeled after SPIM terminal/keyboard interface.
  - Read from keyboard (<u>receiver</u>); 2 device regs
  - Writes to terminal (<u>transmitter</u>); 2 device regs



<u>Serial I/O</u>

- Control register rightmost bit (0): Ready
  - Receiver: Ready==1 means character in Data Register not yet been read;
    - $1 \Rightarrow \ 0$  when data is read from Data Reg
  - Transmitter: Ready==1 means transmitter is ready to accept a new character;
    - $0 \Rightarrow$  Transmitter still busy writing last char
      - I.E. bit (interrupt enable not used by us)
- Data register rightmost byte has data
  - Receiver: last char from serial port; rest = 0
  - Transmitter: when write rightmost byte, writes goes to serial port.

### "Polling" MIPS code

+ Input: Read from keyboard into v0

|            | lui  | <pre>\$t0, 0xffff #ffff0000</pre> |
|------------|------|-----------------------------------|
| Waitloop1: | lw   | <pre>\$t1, 0(\$t0) #control</pre> |
|            | andi | \$t1,\$t1,0x1                     |
|            | beq  | <pre>\$t1,\$zero, Waitloop1</pre> |
|            | lw   | \$v0, 4(\$t0) #data               |

### Output: Write to display from \$a0

|            | lui  | <pre>\$t0, 0xffff #ffff0000</pre>          |
|------------|------|--------------------------------------------|
| Waitloop2: | lw   | \$t1, <u>8</u> (\$t0) #control             |
|            | andi | \$t1,\$t1,0x1                              |
|            | beq  | <pre>\$t1,\$zero, Waitloop2</pre>          |
|            | SW   | <mark>\$a0</mark> , <u>12</u> (\$t0) #data |

Spring 2010

EECS150 - Lec12-proj1

Page 21