UC Berkeley CS150

Checkpoints 1-3: Pipelined MIPS CPU with RS232

0 Introduction

The first three checkpoints in this project are designed to guide the development of a three-stage pipelined MIPS CPU that will be used as a base system in the final two checkpoints. The MIPS processor will be implemented in Verilog by 1-2 person teams.

Please read the specification carefully before you begin your design as this document contains a large amount of information intended to address common pain points. Questions and comments on the project should be posted to the Piazza forum. TAs will be available for advice during office hours and regularly scheduled lab time.

1 Table of Contents

Checkpoints 1-3: Pipelined MIPS CPU with RS232

0 Introduction

1 Table of Contents

2 Hardware

3 MIPS150

3.1 ISA Encoding

3.2 Functional Specification

3.3 Pipelining

3.4 Delay Slots

3.5 Hazards

4 Memory Architecture

4.1 Memory Initialization

4.2 Address Space

4.3 RAM Endianness

5 BIOS

6 Testing

6.1 Module Testbenches

6.2 Complete design verification

6.3 Using the software toolchain

7 Git

8 Checkpoint 1: Block Diagram

9 Checkpoint 2: Design Blocks

9.1 Register File

9.2 ALU

9.3 UART

10 Checkpoint 3: Programmable MIPS CPU

2 Hardware

The project will be run on the Xilinx XUPV5-LX110T Development System, which has a Virtex-5 XC5VLX110T FPGA. For manuals and other reference material, consult the Resources section of the course webpage.

The XUPV5 board contains the physical layers (PHYs) for the serial and graphics interfaces, as shown in the diagram below:

The PHYs, colored pink, convert signals generated on the FPGA to the protocols expected at the ports on the board. For example, the Analog Devices 3202A takes FPGA_SERIAL_RX and converts the voltage levels to conform to RS-232.

3 MIPS150

For this project, we will use a subset of the full MIPS ISA. For simplicity, we will omit floating point, coprocessor, division, multiplication and a few other non-critical instructions.

3.1 ISA Encoding

3.2 Functional Specification

The functionality of each instruction is shown below.

  1. R[$x] indicates the register with address x
  2. SEXT indicates sign extension
  3. ZEXT indicates zero extension
  4. BMEM indicates a byte aligned access to memory
  5. HMEM indicates a half word aligned access to memory
  6. WMEM indicates a word aligned access to memory
  7. PC indicates the memory address of the instruction

Mnemonic

RTL Description

Notes

LB

R[$rt] = SEXT(BMEM[(R[$rs]+SEXT(imm))[31:0]])

delayed

LH

R[$rt] = SEXT(HMEM[(R[$rs]+SEXT(imm))[31:1]])

delayed

LW

R[$rt] = WMEM[(R[$rs]+SEXT(imm))[31:2]]

delayed

LBU

R[$rt] = ZEXT(BMEM[(R[$rs]+SEXT(imm))[31:0]])

delayed

LHU

R[$rt] = ZEXT(HMEM[(R[$rs]+SEXT(imm))[31:1]])

delayed

SB

BMEM[(R[$rs]+SEXT(imm))[31:0]] = R[$rt][7:0]

SH

HMEM[(R[$rs]+SEXT(imm))[31:1]] = R[$rt][15:0]

SW

WMEM[(R[$rs]+SEXT(imm))[31:2]] = R[$rt]

ADDIU

R[$rt] = R[$rs] + SEXT(imm)

SLTI

R[$rt] = R[$rs] < SEXT(imm)

SLTIU

R[$rt] = R[$rs] < SEXT(imm)

unsigned compare

ANDI

R[$rt] = R[$rs] & ZEXT(imm)

ORI

R[$rt] = R[$rs] | ZEXT(imm)

XORI

R[$rt] = R[$rs] ^ ZEXT(imm)

LUI

R[$rt] = {imm, 16'b0}

SLL

R[$rd] = R[$rt] << shamt

SRL

R[$rd] = R[$rt] >> shamt

SRA

R[$rd] = R[$rt] >>> shamt

SLLV

R[$rd] = R[$rt] << R[$rs]

SRLV

R[$rd] = R[$rt] >> R[$rs]

SRAV

R[$rd] = R[$rt] >>> R[$rs]

ADDU

R[$rd] = R[$rs] + R[$rt]

SUBU

R[$rd] = R[$rs] - R[$rt]

AND

R[$rd] = R[$rs] & R[$rt]

OR

R[$rd] = R[$rs] | R[$rt]

XOR

R[$rd] = R[$rs] ^ R[$rt]

NOR

R[$rd] = ~R[$rs] & ~R[$rt]

SLT

R[$rd] = R[$rs] < R[$rt]

SLTU

R[$rd] = R[$rs] < R[$rt]

unsigned compare

J

PC = {PC[31:28], target, 2'b0}

delayed

JAL

R[31] = PC + 8; PC = {PC[31:28], target, 2'b0}

delayed

JR

PC = R[$rs]

delayed

JALR

R[$rd] = PC + 8; PC = R[$rs]

delayed

BEQ

PC = PC + 4 + (R[$rs] == R[$rt] ? SEXT(imm) << 2 : 0)

delayed

BNE

PC = PC + 4 + (R[$rs] != R[$rt] ? SEXT(imm) << 2 : 0)

delayed

BLEZ

PC = PC + 4 + (R[$rs] <= 0 ? SEXT(imm) << 2 : 0)

delayed

BGTZ

PC = PC + 4 + (R[$rs] > 0 ? SEXT(imm) << 2 : 0)

delayed

BLTZ

PC = PC + 4 + (R[$rs] < 0 ? SEXT(imm) << 2 : 0)

delayed

BGEZ

PC = PC + 4 + (R[$rs] >= 0 ? SEXT(imm) << 2 : 0)

delayed

3.3 Pipelining

Your CPU must implement this instruction set using a 3 stage pipeline. The division of the datapath into three stages is left unspecificied as it is an important design decision with significant performance implications. We recommend that you begin the design process by considering which elements of the datapath are synchronous and in what order they need to be placed. After determining the design blocks that require a clock edge, consider where to place asynchronous blocks to minimize the critical path.

3.4 Delay Slots

In the functional specification above, some instructions are noted as “delayed”. These instructions have architectural delay slots, which means the instruction following the delayed instruction is (i) always executed, and (ii) must not have any dependencies on the delayed instruction.

Our C compiler and assembler will automatically fill delay slots; you therefore should not explicitly do this yourself unless you are writing MIPS binary.

3.5 Hazards

As you have learned in lectures, pipelining creates data and control hazards. To resolve the control hazard, all branching instructions have one delay slot. To resolve the data hazard, you will have to implement data forwarding in your pipeline (both from the “execute” and “memory” stages).

One alternative to forwarding data from the “memory” stage in your pipeline is to used a negative-edge triggered writeback to the register file, as discussed in lecture. If you do pursue this route, ensure that you fully understand the timing implications for the “memory” stage as well as the implications for placement of the register file in the pipeline..

You are not permitted to use stalls in your design during this phase of the project. Additionally, you may not use any negative-edge triggering except for the register file (if you choose that implementation).

4 Memory Architecture

The datapath shown in DDCA has separate instruction and data memories. Although this is an intuitive representation, it does not let us modify instruction memory to run new programs. The CPUs built in this checkpoint will be able to receive MIPS binaries through a serial interface, store them into instruction memory, then jump to the downloaded program. To facilitate this, we will adopt a modified memory architecture:

The directories /hardware/src/imem_blk_mem and /hardware/src/dmem_blk_mem contain configuration files and build scripts for these memories. As you can see from the diagram, we use a dual-port block ram for the instruction cache. This causes all writes to the data memory to occur simultaneously in the instruction memory, so use caution when you set your stack pointer or store data as you could clobber instructions.

4.1 Memory Initialization

Inside the directories listed above, there are 3 skeleton files:

  1. *mem_blk_mem.xco: This file contains the configuration information for the RAM. The only attribute you will need to modify is coe_file. To initialize the contents of the block ram, set this to point to a .coe file (generated by running make in one of the software directories).
  2. build: Running ./build will generate verilog for the block ram using the specified coe file as an initialization vector. You will need to re-run this script if you change the .coe file. Also be careful about your partner committing files generated by this script, as you may accidentally build your hardware with the wrong .coe file.
  3. clean: deletes all of the files generated by build.

The skeleton files contain two programs that will run on the device and communicate over serial: echo and bios150v3. Echo simply reads characters from the UART and sends them back. The bios program, which is significantly more complex, will allow you to interact with the device and send MIPS binaries.

As an example, if you wanted to initialize your instruction memory with the bios program, you would follow these steps:

  1. Run ./clean in the memory directory
  2. Edit *mem_blk_mem.xco so that coe_file=./bios150v3.coe
  3. Run make in /software/bios150v3/
  4. Copy bios150v3.coe to the memory directory
  5. Run ./build in the memory directory

You can then instantiate the block ram module in your project.

In simulation, the memories will be initialized using a .mif file. The toolchain in the software directory will automatically generate these files for you; after running make, move the .mif file to imem_blk_mem/imem_blk_mem.mif.

4.2 Address Space

Since we will also want to access other sources of data in our CPU, we will utilize memory-mapped I/O: the target of the memory access instructions will depend on the address loaded. At this stage of the checkpoint, we have two potential devices to access: the memory and the UART.

For IO devices, your design should enforce that the top nibble of the address is 0x8. For memory, on the other hand, your design should only enforce that address[28] (i.e., the lowest bit in the upper nibble) is set to 1. (In later checkpoints, we will need to be able to write to multiple memories simultaneously).

                                   

Address

Device

Access

xxx1-xxxx-xxxx-xxxx-xxxx-xxxx-xxxx-xxxx

Memory

R/W

1000-xxxx-xxxx-xxxx-xxxx-xxxx-yyyy-yyyy

I/O

R/W

For the I/O devices, the bottom byte (yyyy-yyyy above) will be used to determine the device and function. Use the following encoding for the UART:

Address

Function

Access

Data Encoding

0x8XXXXX00

UART control signals

Read

{30’bx, DataOutValid, DataInReady}

0x8XXXXX04

UART receiver data

Read

{24’bx, DataOut}

0x8XXXXX08

UART transmitter data

Write

{24’bx, DataIn}

Note that the DataInValid and DataOutReady signals are not specified. The details of setting those two signals correctly are left to you.

4.3 RAM Endianness

The block RAMs have 4096 32-bit rows, as such, they accept 12 bit addresses (the block rams are word addressed). Additionally, the ISA calls for byte addressed memory in certain instructions. To enable this, the bottom 14 bits of the addresses computed by the CPU are relevant for memory access.

The diagram below illustrates the 12-bit word addresses. Each 32-bit row consists of four bytes, and the byte offsets are shown inside the memory blocks. Observe that the RAM is big-endian:

Since your block RAMs have 32-bit rows, you can only read out data out of your block RAM 32-bits at a time. This is a problem when you want to execute a lh or lb instruction, as there is no way to indicate to the block RAM which 8 or 16-bits of the 32-bits you want to read out. Therefore, you will have to mask the output of the block RAM to select the appropriate portion of the 32-bits you read out. Note the above diagram in doing so, e.g. if you want to execute a lb on an address ending in 01, you will want bits [23:16] of the 32-bits that you read out of block RAM (thus storing {24’b0, output[23:16]} to a register). If you want to execute a lh on an address ending in 10 or 11, you want bits [15:0].

5 BIOS

We have provided a bios program in software/bios150v3 that will allow you to interact with your CPU. To use this, compile this program, initialize your instruction memory with bios150v3.coe, build the memory, build your CPU and then impact it to the FPGA.

Similar to the Echo program used in Lab 5, we will use screen to interact with our device over the serial port. Issue the following command to open screen:

screen $SERIALTTY 115200

Please use ctrl-a shift-k to close screen sessions or the port will be unusable to subsequent students. Use screen -x to reconnect a screen session if you improperly close it.

If all goes well, you should see a ‘>’ prompt (you may need to press return first). The following commands are available:

Command

Function

jal <address>

Jump to address (hex)

sw, sb, sh <data> <address>

Store data (hex) to address (hex)

lw, lb, lhu <address>

Load data (hex) from address (hex)

 

(Feel free to look at the bios150v3 code and implement additional functionality!)

Finally, the bios also enables us to load programs to the device. Close screen the proper way, and then execute:

coe_to_serial <coe_file> <address>

To send a .coe file to the CPU, which the bios will store at the specified address. Then, you should be able to re-open the screen session and jal to the address of the new program.

Before you make the coe file in one of the software directories, you will likely need to edit start.s and <program_name>.ld. start.s sets the initial stack pointer and the .ld file should contain the address the program will be stored at. As you may recall from CS 61C, the stack is where programs store registers and return addresses when functions are called. The stack grows down from the initial stack pointer, so be sure to set it high enough that it doesn’t grow down into the currently running program.

6 Testing

The design specified for this project is a complex system and debugging can be very difficult without tests that increase visibility of certain areas of the design. Although we will not require or grade testing efforts, we expect that teams utilizing the testing tools will be able to complete checkpoints faster.

6.1 Module Testbenches

Creating testbenches that exercise certain modules in you design is analogous to unit testing in software. The testbenches provided for Checkpoint 2 are examples of how to test individual modules. It may not be possible to exhaustively test a module, but including some edge cases as well as sanity checks still provides insurance against regressions as your design progresses. We have provided testbenches for your RegFile and ALU as examples.

6.2 Complete design verification

Even with functioning modules, this project requires a significant amount of wiring for the complete design to function. To test your whole CPU, we have provided EchoTestbench as an example. The sim/tests/echo.do file shows how to initialize memory in simulations.

To make your own system-level tests, copy the software/echo directory to a new name, make a new tests/<testbenchname>.do file, and write the corresponding testbench.

6.3 Using the software toolchain

A GCC MIPS toolchain has been built and installed in the cs150 home directory, these binaries will run on any of the p380 machines in the 125 Cory lab. The most relevant pieces of the toolchain are given below,

  1. mips-gcc GCC for MIPS, compiles C files to MIPS binaries
  2. mips-as MIPS assembler, compiles assembly to MIPS binaries
  3. mips-objdump allows easy viewing of MIPS binaries

The easiest way to use this toolchain will be to copy the example project in the software directory of the skeleton files. That might look something like the following,

% cd software

% cp -r example helloworld

% cd helloworld

% mv example.c helloworld.c

% mv example.ld helloworld.ld

% gedit Makefile

The only editing required in the Makefile is to change the TARGET variable to be helloworld. You can then edit helloworld.c, and run make to compile. Adding C or assembly files to the directory, with the proper extension will cause them to automatically be compiled. There are a few things to be aware of in the example project,

  1. start.s: This is an assembly file that contains the starting point for your program. By default GCC looks for a label named _start and makes it the entry point. The default start.s jumps to the main label, and initializes the stack pointer to the address 0x10000100.
  2. example.ld: This is a linker script, it guarantees that the start label is at address 0x10000000.
  3. example.elf: Original binary produced by the toolchain, mips-objdump -d example.elf will tell you which instructions make up the binary.
  4. example.mif: File generated by toolchain, used to initialize block RAM memory for simulation.
  5. example.coe: File generated by toolchain, used to initialize block RAM memory for synthesis.

7 Git

If you have not yet configured your repository, please follow the instructions in Lab 5.

You should check frequently for updates to the skeleton files. To pull them into your repository, assuming you have correctly followed the configuration instructions, issue:

git pull staff master

from a directory in your repository.

8 Checkpoint 1: Block Diagram

The first step is to create a block diagram for your CPU based on the information above. The diagram may be drawn by hand or via software as long as it clearly shows all major elements of the datapath as well as the placement of the pipeline registers. The wiring details should be shown at the same (or greater) level of detail as shown in DDCA. For the checkpoint, a TA will review your block diagram in lab.

Checkpoint 1 is due by 8 PM on Wednesday, October 5. Early checkoff is recommended.

Some advice:

  1. Keep in mind which elements are synchronous and which are asynchronous when placing pipeline registers.
  1. For data and instruction memory, we will be using block rams. These have synchronous reads and writes.
  1. A good starting point may be the 5-stage pipeline in DDCA - consider what you could do to remove two stages.
  2. The design will be easier to work with and read if you tape a few sheets of paper together for a larger workspace.
  3. Use pencil if you do this by hand - it will be a working document throughout the semester.

9 Checkpoint 2: Design Blocks

To ensure that you are progressing towards completing the entire CPU, the second checkpoint will require completion of a few critical design blocks. In order to test the functionality of these blocks, we will require that they conform to the specifications outlined in their respective sections.

Checkpoint 2 is due by 8 PM on Wednesday, October 12. Early checkoff is recommended.

9.1 Register File

Implement your register file in the module RegFile.v (in /hardware/src). Do not change the module definition. The register file will be tested using RegFileTestBench.v. In your implementation, do not forget that writes are synchronous while reads are asynchronous.

9.2 ALU

For this checkpoint, we require that you complete the ALU module as well as an ALU decoder module. For checkoff, these modules must be instantiated in ALUTestBench.v, which will provide instructions to the ALU decoder and will verify the output of the ALU.

9.3 UART

This should be the same module used in Lab 5. If your team has not been checked off for Lab 5, we will require that you submit this.

Checkoff 2 will consist of successfully running ALUTestBench.v and RegFileTestBench.v.

10 Checkpoint 3: Programmable MIPS CPU

This checkpoint requires tying together all of your modules into a complete, functioning system. Working designs will be able to execute MIPS binaries sent to the device through the serial interface.

Checkpoint 3 is due by 8 PM on Wednesday, November 2. Early checkoff is strongly recommended.

Document revision history:

09/28/11 .. Initial specification released