CS61C Fall 2016 Project 3-2: CPU

TA: Rebecca Herman

Due Friday, November 4th, 2016 @ 11:59 PM

Obtaining Files | Your Job | Overview of Files | ISA | Pipelining | Approach | Logisim Notes | Submission

Updates and Clarifications

Introduction

Now that you've implemented the Regfile and ALU, you're ready to design your own 2-stage pipelined processor! When you're done, you'll be able to run MIPS code through your assembler and linker, and then on your very own CPU :D


IMPORTANT INFO - PLEASE READ


0) Obtaining the Files

We added a template for your CPU (cpu.circ), a harness (run.circ), and a data memory module (mem.circ) which you will need for implementing your CPU. We also added a basic assembler (assembler.py) to help you more thoroughly test your CPU. Some sample CPU tests might be coming soon! Please fetch and merge the changes from the proj3-2 branch of the starter repo. For example, if you have set the proj3-starter remote link:

cd proj3-XXX                  #or proj3-XXX-YYY 
git fetch proj3-starter
git merge proj3-starter/proj3-2-starter -m "merge proj3-2 skeleton code"

If you do not have the proj3-starter remote link from part I, you can run:

git remote add proj3-starter https://github.com/61c-teach/fa16-proj3-starter.git

If you do have some other inorrect value for the proj3-starter remote link, delete it first by running:

git remote rm proj3-starter

1) Your Job

You are responsible for constructing the entire datapath and control of the CPU from scratch. Your completed processor should implement the ISA detailed below in the section Instruction Set Architecture (ISA) using a two-cycle pipeline, specified below. Read all of sections 2-5 very carefully!

2) Overview of Files

Your processor (cpu.circ) will contain an instance of your ALU and Register File, as well as a memory unit provided in your starter kit and some additional registers you will need. The processor harness run.circ will function as your instruction memory in that it will provide your processor with its program. It feels slightly unconventional in that your CPU will actually be running as a subcircuit of the "instruction memory", so inputs to instruction memory become outputs of our CPU, and outputs of instruction memory become inputs to our CPU. Thus your processor will output the address of an instruction, and accept the instruction at that address as an input. Inspect run.circ to see exactly what's going on. (This same harness will be used to test your final submission, so make sure your CPU fits in the harness before submitting your work!)

Just like in part I, be careful not to move the input or output pins! You should ensure that your processor is correctly loaded by a fresh copy of run.circ before you submit. You can download a fresh copy from the starter repo website.

2.0) Processor

The skeleton for your processor can be found in cpu.circ. Your processor has 2 inputs that come from the harness:

Input Name Bit Width Description
INSTRUCTION 32 Driven with the instruction at the instruction memory address identified by the FETCH_ADDRESS (see below).
CLOCK 1 The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the CLK input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not AND it with anything, etc.).

Your processor must provide 8 outputs to the harness:

Output Name Bit Width Description
$s0 32 Driven with the contents of $s0. FOR TESTING
$s1 32 Driven with the contents of $s1. FOR TESTING
$s2 32 Driven with the contents of $s2. FOR TESTING
$ra 32 Driven with the contents of $ra. FOR TESTING
$sp 32 Driven with the contents of $sp. FOR TESTING
$hi 32 Driven with the contents of $hi. FOR TESTING
$lo 32 Driven with the contents of $lo. FOR TESTING
FETCH_ADDRESS 32 This output is used to select which instruction is presented to the processor on the INSTRUCTION input.

ONE MORE THING: In addition to these inputs and outputs, you also need to have an LED unit which lights up to signify signed overflow. This indicator should be wired to the signed overflow port of your ALU. This should be viewable in your main circuit.

2.1) Memory

The memory unit is already fully implemented for you! Here's a quick summary of its inputs and outputs:

Output Name In- or Out-put? Bit Width Description
A: ADDR In 32 Address to read/write to in Memory
D: WRITE DATA In 32 Value to be written to Memory
En: WRITE ENABLE In 1 Equal to one on any instructions that write to memory, and zero otherwise
Clock In 1 Driven by the clock input to cpu.circ
D: READ DATA Out 32 Driven by the data stored at the specified address.

One important caveat is that the memory is only 64 MB in size, due to the limitations of Logisim's built-in memory block. If you try to read or write addresses greater than or equal to 226, they will alias to lower addresses. That is, the address 226 + X will appear to be the same as X.

3) Instruction Set Architecture (ISA)

Your CPU will support the instructions listed below. In all of the instructions you recognize from MIPS, the instruction format, opcode, funct, and register numbers should be taken directly from your greensheet. Notice that not all of the native MIPS functions are listed in the CORE INSRUCTION SET section of the greensheet - you may also need to look in the ARITHMETIC CORE INSTRUCTION SET section, and in the OPCODES, BASE CONVERSION, ASCII SYMBOLS table on the back! There will also be an instruction that is foreign to MIPS: specifications for this new instruction will follow the table, as well as clarifications on some other selected instructions.

Instruction Format
Shift Left Logical sll $rd, $rt, shamt
Shift Right Logical srl $rd, $rt, shamt
Shift Right Arithmetic sra $rd, $rt, shamt
Jump Register jr $rs
Jump and Link Register jalr $rs
Move from Hi mfhi $rd
Move from Lo mflo $rd
Multiply mult $rs, $rt
Divide Unsigned divu $rs, $rt
Add add $rd, $rs, $rt
Add Unsigned addu $rd, $rs, $rt
Sub sub $rd, $rs, $rt
Sub Unsigned subu $rd, $rs, $rt
And and $rd, $rs, $rt
Or or $rd, $rs, $rt
Exclusive Or xor $rd, $rs, $rt
Not Or nor $rd, $rs, $rt
Set Less Than slt $rd, $rs, $rt
Set Less Than Unsigned sltu $rd, $rs, $rt
Jump j label
Jump and Link jal label
Branch if Equal beq $rs, $rt, label
Branch if Not Equal bne $rs, $rt, label
Branch if Equal and Increment NEW! beqinc $rs, $rt, label
Add Immediate addi $rt, $rs, immediate
Add Immediate Unsigned addiu $rt, $rs, immediate
Set Less Than Immediate slti $rt, $rs, immediate
Set Less Than Immediate Unsigned sltiu $rt, $rs, immediate
And Immediate andi $rt, $rs, immediate
Or Immediate ori $rt, $rs, immediate
Exclusive Or Immediate xori $rt, $rs, immediate
Load Upper Immediate lui $rt, immediate
Load Word lw $rt, offset($rs)
Store Word sw $rt, offset($rs)

Branch if Equal and Increment (beqinc)

FORMAT: I
OPERATION (RTL):
(if R[rs] == R[rt]) PC = PC + 4 + BranchAddr;
else R[rt] = R[rt] + 1;
BranchAddr = { 14{immediate[15]}, immediate, 2'b0}
OPCODE: 0x6

This operation would be useful for implementing for loops. For instance:
    for(int i = 0; i < 10, i++)
would probably compile to something like
        addiu $t0 $0 10
        addiu $s0 $0 0
loop: beqinc $t0 $s0 end

Jump and Link (jal)

Our ISA does not expose jump delay slots to hardware, so it does not make sense to store PC + 8 in $ra on a jal instruction! Instead we will store PC + 4

4) Pipelining

Your processor will have a 2-stage pipeline:

  1. Instruction Fetch: An instruction is fetched from the instruction memory.
  2. Execute: The instruction is decoded, executed, and committed (written back). This is a combination of the remaining stages of a normal MIPS pipeline.

You should note that data hazards do NOT pose a problem for this design, since all accesses to all sources of data happens only in a single pipeline stage. However, there are still control hazards to deal with. Our ISA does not expose branch delay slots to software. This means that the instruction immediately after a branch or jump is not executed if the branch is taken. But for the sake of maintaining our CPI (cycles per instruction) we do not always want to stall on a branch -- if we do not take the branch, then we want to continue as we would have. This makes your task a bit more complex. By the time you have figured out if you will take a branch or jump in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to "kill" instructions that are being fetched if the instruction under execution is a jump or a taken branch. Instruction kills for this project should be accomplished by MUXing a nop into the instruction stream and sending it into the Execute stage instead of the fetched instruction. Notice that 0x00000000 is a nop instruction; please use this, as it will simplify grading and testing. You should only kill if a branch is taken (do not kill otherwise), but do kill on every type of jump.

Because all of the control and execution is handled in the Execute stage, your processor should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency and the branch/jump delays. However, we will be enforcing the two-pipeline design. If you are unsure about pipelining, it is perfectly fine (maybe even recommended) to first implement a single-cycle processor. This will allow you to first verify that your instruction decoding, control signals, arithmetic operations, and memory accesses are all working properly. From a single-cycle processor you can then split off the Instruction Fetch stage with a few additions and a few logical tweaks. Some things to consider:

You might also notice a bootstrapping problem here: during the first cycle, the instruction register sitting between the pipeline stages won't contain an instruction loaded from memory. How do we deal with this? It happens that Logisim automatically sets registers to zero on reset; the instruction register will then contain a nop. We will allow you to depend on this behavior of Logisim. Remember to go to Simulate --> Reset Simulation (Ctrl+R) to reset your processor.

5) Approach

First implement a single-cycle CPU, and don't worry about pipelining. To implement your single-cycle CPU, try first making a CPU that does exactly one instruction. Then add all the instructions of that type. Then add another new instruction, etc, until you've implemented a fully-working processor! More in-depth hints can be found in lab7, and we may release more soon as well :) Don't forget to use logisim's built-in blocks -- you shouldn't have to make muxes or comparators by hand!

Once this is working and passing all of your own tests, then try to pipeline your CPU by splitting it into two stages. Think carefully about the way this affects the way stage 1 communicates with the other 4 of MIPS's conventional 5 stages. Then you should test your circuitry on the staff tests, and make more tests of your own!


Logisim Notes

If you are having trouble with Logisim, RESTART IT and RELOAD your circuit! Don't waste your time chasing a bug that is not your fault. However, if restarting doesn't solve the problem, it is more likely that the bug is a flaw in your project. Please post to Piazza about any crazy bugs that you find and we will investigate.

Things to Look Out For

Logisim's Combinational Analysis Feature

Logisim offers some functionality for automating circuit implementation given a truth table, or vice versa. Though not disallowed (enforcing such a requirement is impractical), use of this feature is discouraged. Remember that you will not be allowed to have a laptop running Logisim on the final.


Testing

For part 2, it is somewhat difficult to provide small unit tests such as the ones from part 1 since you are completing the full datapath. As such, the best approach would be to write short MIPS programs and exercise your datapath in different ways. We have provided 3 sample tests for a completed, pipelined CPU, and you can run them in the same way you did in part 1: ./run-sanity-test.sh. To understand what these tests do, look in the tests/assem folder for the MIPS programs that correspond to each test (this might be significantly easier than looking at the expected output). If you are failing a test, investigate as you did in part 1: open the CPU-XXX.circ file corresponding to that test, right click on your CPU and choose "view main", and tick the clock using command-t. Make sure your CPU behaves the way you expect it to.

To help you with testing an incomplete processor, we have also provided reference outputs for two of the tests (addition and multiplication) for a non-pipelined processor. To check your CPU against this standard, you will need to modify tests/sanity_test.py. See CPU_TESTING_INSTRUCTIONS for directions.

Of course, the tests we have provided are not comprehensive, and you will need to do more testing on your own. To facilitate this, we have provided you with a rudimentary MIPS assembler that functions similarly to your project 2. To assemble a MIPS file, simply run the assembler with python and pass in your input file as follows:

vim test.s  # create your assembly file. Remember to only use the instructions provided in the ISA above
python assembler.py test.s # This will generate an output file named test.hex
python assembler.py test.s -v # Same as above, but also provides some verbose output on command line.
    

The assembler supports all the instructions in your project ISA. It supports register names, but not register numbers. You can enter numbers in decimal or in hex, but if you are working with an I-type instruction that 0-extends, negative numbers are not supported. Enter the corresponding hex representation instead.

Note: the assembler has only been tested with python 2.7, so it may be easier to run it remotely off of the hive* servers if you haven't set up your python environments.

Once you have generated the machine code, you'll have to load it into the instruction memory unit in run.circ and begin execution. To do so, first open run.circ and locate the Instruction Memory Unit.

Right click on the memory module and choose "load image." This is where we load the file outputted by the assembler earlier.

Once you've loaded the machine code you can tick the clock manually and watch your CPU execute your program! Using the "mouse" tool, you can right click on the CPU and choose "view main" to see how your datapath is behaving under the given inputs. Or using the "poke" tool, if you click on your CPU, a small magnifying glass will appear. Click on the magnifying glass 2 more times, and you'll reach the same place. You can also fire up MARS at the same time and step through your code there simultaneously. This will allow you to compare the expected behaviour, as seen in MARS, with the behaviour your circuit is exhibiting. Of course, this approach is not valid for the new instructions - for those you'll have to rely solely on your own test assembly code.

Submission

We will submit proj3-2 just like we submitted proj3-1. There are two steps required to submit proj3-2. Failure to perform both steps will result in loss of credit:

  1. First, you must submit using the standard unix submit program on the instructional servers. This assumes that you followed the earlier instructions and did all of your work inside of your git repository. To submit, follow these instructions after logging into your -XXX class account:

    cd ~/proj3-XXX                             # Or where your shared git repo is
    submit proj3-2

    Once you type submit proj3-2, follow the prompts generated by the submission system. It will tell you when your submission has been successful and you can confirm this by looking at the output of glookup -t.


  2. Additionally, you must submit proj3-2 to your (shared, if you have a partner) Bitbucket repository:

    cd ~/proj3-XXX                               # Or where your shared git repo is
    git add -u                           
    git commit -m "project 3-2 submission"       # The commit message doesn't have to match exactly.
    git tag "proj3-2-sub"                        # The tag MUST be "proj3-2-sub". Failure to do so will result in loss of credit.
    git push origin proj3-2-sub                  # This tells git to push the commit tagged proj3-2-sub

Resubmitting

If you need to re-submit, you can follow the same set of steps that you would if you were submitting for the first time, but you will need to use the -f flag to tag and push to Bitbucket:

# Do everything as above until you get to tagging
git tag -f "proj3-2-sub"
git push -f origin proj3-2-sub

Note that in general, force pushes should be used with caution. They will overwrite your remote repository with information from your local copy. As long as you have not damaged your local copy in any way, this will be fine.

Deliverables

cpu.circ
alu.circ
regfile.circ 

We will be using our own versions of the *-harness.circ and run.circ files, so you do not need to submit those. In addition, you should not depend on any changes you make to those files. I plan to grade your CPU once with your ALU and RegFile and once with the staff solution ALU and RegFile, and then give you the maximum of the two scores, so as to avoid double-jeapardy as well as surprises, as you will be practicing and testing with your own solutions for Part1.

Grading

This project will be graded in large part by an autograder. The autograder cannot tell the difference between a small wiring mistake and a fundametal mistake. You will have the opportunity to submit regrade requests (for a 5% penalty each time I open your file). If you explain what the wiring problem is and I can understand your directions, then I will open your files, fix the wiring problem and run the autograder again. For this reason, please try to make your circuits neat and readable, or you jeapardize your opportunity for regrades!