Part 1 Due Sunday, April 10 2011 at 11:59 PM

Part 2 Due Sunday, April 17 2011 at 11:59 PM

TA: Conor Hughes
(Based on original spec by Ben Sussman and Brian Zimmer, and modified spec of Albert Chae, Paul Pearce, Noah Johnson)

Post any questions or comments to the google group.

Updates and clarifications at a glance

(last modified 4/16/11 01:11)

  • This is an individual project.
  • We will be using version 2.7.1 of Logisim. This is the version that is installed on the lab machines.
  • We will not test shift amounts >= 16. Behavior in this case is not defined.
  • We were using an old image of the RAM module which used "out" instead of "ld". This has been corrected.
  • The overflow LEDs and display bundles should go in the top level of the CPU circuit. They are thus not required for part 1.
  • The assembler has been updated to include support for li and la pseudoinstructions. Also, we've added a new assembler with all kinds of magic, including producing data memory images and the requisite directives. It's courtesy Charley Crissman, and you can get it here (also now in the start kit).
  • Get a tarball of some sample autograder tests here (also now in the start kit). Read the README to figure out how to use and run them.
  • Get a tarball of a SUBSET of the part 1 tests here (also now in the start kit). Read the README to figure out how to use and run them.
  • For both of these sets of tests, you cannot run them remotely on the macs in 200SD. Either go to lab, or SSH into a login server and forward X.
  • READMEs in test tarballs have been updated (4/15/11 21:44)
  • You can use a memory with separate read and write ports if you should so desire.
  • Branches should work with any two registers. There was a typo in the spec which implied otherwise and it has been fixed.
  • Wording regarding beq/bne and jal/j has been improved.

Overview

In this project you will be using Logisim to create a 16-bit two-cycle processor. It is similar to MIPS, except that both the datapath and the instructions are 16-bits wide, it has only 4 registers, and memory addresses represent 16-bit words instead of 8-bit bytes. Please read this document CAREFULLY as there are key differences between the processor we studied in class and the processor you will be designing for this project.

Before you begin, copy the start kit to your home directory:

          $ cp -R ~cs61c/proj4StartKit ~/proj4
        

Pipelining

You will implement a 2-stage pipeline for this project. The first stage will be an instruction fetch stage: instructions are fetched from the instruction memory, and placed into an instruction register. The second stage will be an execute stage: the instruction in the instruction register is decoded and executed. Instructions are all executed and committed (written back) in the execute stage.

You should note that data hazards do not pose a problem for this design, since all accesses to all sources of data happens only in a single pipeline stage. However, there are still control hazards to deal with. Our ISA does not expose branch delay slots to software. This means that the instruction immediately after a branch or jump is not necessarily executed if the branch is taken. This makes your task a bit more complex. By the time you have figured out that a branch or jump is in the execute stage, you have already accessed the instruction memory and pulled out (possibly) the wrong instruction. You will therefore need to "kill" instructions that are being fetched if the instruction under execution is a jump or branch (and, if it is a branch, if the branch is taken). Instruction kills can and should be accomplished by muxing a nop into the instruction stream and sending the nop into the execute stage (or loading it into the instruction fetch register), instead of using the fetched instruction. Notice that a 16-bit 0 is a nop instruction; please use this, as it will simplify grading and testing.

You must solve the control hazards by this method of killing.

Because we do not expose branch delay slots to software, the sequence of states your processor goes through should be more or less indistinguishable from a single-cycle implementation, barring the one-cycle startup latency, and the branch delays. However, we will be enforcing the two-pipeline design. For branches, you should only kill if the branch is taken; do not kill otherwise. Simply kill whenever you see a taken branch or jump. This reduces the space of correct implementations, which streamlines the process for everyone involved.

You might also notice a bootstrapping problem here: during the first cycle, the instruction register won't contain an instruction loaded from memory. How do we deal with this? It happens that Logisim automatically sets registers to zero on reset; the instruction register will then contain a nop. We will allow you to depend on this behavior of Logisim. Remember to go to Simulation->Reset simulation to reset your processor!


Deliverables

This project will have two parts.

Part 1, due April 10 (15%)

In the first part, you will construct a register file and Arithmetic Logic Unit (ALU).

Register File

Your processor needs a register file to store the data in the architectural registers. You will therefore design a register file to manage the four 16-bit registers in our ISA. After being told to write data to a particular register, you will be able to retrieve that data by asking for the value of that register on subsequent clock cycles.

You are provided with the skeleton of a register file in Regfile.circ. The register file circuit has six inputs:

Read Register 1 (2 bits) Determines which register's value is sent to the Read Data 1 output, see below.
Read Register 2 (2 bits) Determines which register's value is sent to the Read Data 2 output, see below.
Write Register (2 bits) Determines which register to set to Write Data on the next rising edge of the clock, assuming that the RegWrite input is high.
write data (16 bits) Determines what data to write to the register identified by the Write Register input, on the next rising edge of the clock, assuming the RegWrite input is high.
RegWrite (1 bit) Determines whether data is written on the next rising edge of the clock.
clk (1 bit) Input for the clock. This can be sent into subcircuits or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

The register file also has the following six outputs:

Reg 0 value (16 bits) Always driven with the value of register 0. This is primarily for grading & debugging; if you were really designing a register file you would probably omit this output.
Reg 1 value (16 bits) Always driven with the value of register 1. This is primarily for grading & debugging; if you were really designing a register file you would probably omit this output.
Reg 2 value (16 bits) Always driven with the value of register 2. This is primarily for grading & debugging; if you were really designing a register file you would probably omit this output.
Reg 3 value (16 bits) Always driven with the value of register 3. This is primarily for grading & debugging; if you were really designing a register file you would probably omit this output.
Read Data 1 (16 bits) Driven with the value of the register identified by the Read register 1 input.
Read Data 2 (16 bits) Driven with the value of the register identified by the Read register 2 input.
You can make any modifications to Regfile.circ you want, but the outputs must obey the behavior specified above. In addition, your Regfile.circ that you submit must fit into the Regfile-harness.circ file we have provided for you. This means that you should take care to not reorder inputs or outputs, though you can move them around if you need more space or something. A circuit like Regfile-harness.circ will be used to test your register file for grading. You should download a fresh copy of Regfile-harness.circ and make sure your Regfile.circ is cleanly loaded before submitting.

Arithmetic Logic Unit (ALU)

You will also design an ALU that your processor can use to do math. You will tell your ALU what operation to perform, and it will drive the result with the result of that operation.

We have provided a skeleton of an ALU for you in alu.circ. It has three inputs:

X (16 bits) Data to use for X in the ALU operation.
Y (16 bits) Data to use for Y in the ALU operation.
switch (S) (3 bits) Selects what operation the ALU should perform (see below).

The ALU also has three outputs:

signed overflow (1 bit) High iff the operation was an addition or subtraction, and there was signed overflow.
Result (16 bits) Result of the ALU operation.
equal (1 bit) High iff the two inputs X and Y are equal.

The operation you should perform is given by the following table:

switch meaning
0 or: result = X | Y
1 and: result = X & Y
2 add: result = X + Y
3 sub: result = X - Y
4 sllv: result = X << Y
5 srlv: result = X >> Y (zero-fill)
6 srav: result = X >> Y (sign-fill)
7 slt: result = (X < Y) ? 1 : 0 (treat X and Y as signed)

You can assume for shift operations that Y will be positive and less than 16

The same instructions as the register file regarding rearranging inputs and outputs apply for the ALU. In particular, you should ensure that your ALU is correctly loaded by a fresh copy of alu-harness.circ before you submit.

You will submit your ALU and register file for part 1. For part 2, which we expect to be more time-consuming, you will submit your completed processor.

Part 2, due April 17 (85%)

Processor

We have provided a skeleton for your processor in cpu.circ along with a testing harness in cpu-harness.circ. Your completed processor should implement the ISA detailed below in the section "Instruction Set Architecture (ISA)", using a two-cycle pipeline. It will interact with our harness through two inputs, and 10 outputs.

Your processor will get its program from the processor harness we have provided. It will send the address of instruction memory it wants to access to the harness through an output, and accept the instruction at that address as an input. Inspect cpu-harness.circ to see exactly what's going on. Your processor has two inputs that come from the harness.

From Instr Mem (16 bits) Driven with the instruction at the instruction memory address identified by the "To Instr Mem" output (see below).
clk (1 bit) The input for the clock. As with the register file, this can be sent into subcircuits (e.g. the clk input for your register file) or attached directly to the clock inputs of memory units in Logisim, but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).

Your processor must provide 10 outputs to the harness:

r0 (16 bits) Driven with the contents of register r0.
r1 (16 bits) Driven with the contents of register r1.
r2 (16 bits) Driven with the contents of register r2.
r3 (16 bits) Driven with the contents of register r3.
d0 (16 bits) Driven with the value being displayed on the first display.
d1 (16 bits) Driven with the value being displayed on the second display.
To Instr Mem (16 bits) This output is used to select which instruction is presented to the processor on the "From Instr Mem" input.
Data Mem Data (16 bits) Driven with the data being written to memory. When no data is being written to memory, this can be driven with whatever you want.
Data Mem Addr (16 bits) Driven with the address being written in memory. When no data is being written to memory, this can be driven with whatever you want.
Data Mem Write (1 bit) High when data is going to be written to memory. Low otherwise.

The same instructions as the register file and ALU regarding rearranging inputs and outputs apply for the processor. In particular, you should ensure that your ALU is correctly loaded by a fresh copy of cpu-harness.circ before you submit.

Notice that part 2 is worth much more than part 1. We expect it to take much more time as well. Just something to consider when budgeting your time for this project.


Instruction Set Architecture (ISA)

You will be implementing a simple 16-bit processor with four registers ($r0 through $r3). It will have separate data and instruction memory. Because this is a 16-bit architecture, our words are 16 bits wide, unlike the 32-bit MIPS we have been studying in class. For the remainder of this document, a WORD refers to 16 bits. Each of the four registers is big enough to hold ONE word.

IMPORTANT: Because of the limitations of Logisim and to make things simpler, our memories will be WORD (16 bit) addressed, unlike MIPS which allows you to address each byte individually.

The instruction encoding is given below. Your processor will pull out a 16-bit value from instruction memory and determine the meaning of that instruction by looking at the opcode (the top four bits, which are bits 15-12). If the instruction is an R-type (i.e. opcode == 0), then you must also look at the funct field.

Note how we did not fill up the following table with 16 instructions as is possible with 4 bits of opcode, nor did we have 64 R-Type instructions with 6 bits of funct. This way the project is shorter and easier.

15-12 11 10 9 8 7 6 5 4 3 2 1 0
0 rs rt rd funct See R-type Instructions
1 rs rt immediate (unsigned) disp: DISP[imm] = $rs
2 rs rt immediate (unsigned) lui: $rt = imm << 8
3 rs rt immediate (unsigned) ori: $rt = $rs | imm
4 rs rt immediate (signed) addi: $rt = $rs + imm
5 rs rt immediate (unsigned) andi: $rt = $rs & imm
6 target address jump and link into $r3
7 target address jump
8 rs unused jump register: PC = $rs
9 rs rt offset (signed) beq
10 rs rt offset (signed) bne
11 rs rt immediate (signed) lw: $rt = MEM[$rs + imm]
12 rs rt immediate (signed) sw: MEM[$rs+imm] = $rt

R-Type Instructions
funct meaning
0 or: $rd = $rs | $rt
1 and: $rd = $rs & $rt
2 add: $rd = $rs + $rt
3 sub: $rd = $rs - $rt
4 sllv: $rd = $rs << $rt
5 srlv: $rd = $rs >> $rt
6 srav: $rd = $rs >> $rt
7 slt: $rd = ($rs < $rt) ? 1 : 0

Some specifics on selected instructions:

shifting instructions

We will not test shift amounts greater than 15; behavior in this case is undefined.

jump & jal

The jump and jal instruction's argument is a pseudoabsolute address, similar to MIPS. address is an unsigned number representing the lower 12 bits of the next instruction to be executed. The upper four bits are taken from the current PC (i.e., the address of the j or jal instruction). We do NOT do any concatenation of zeroes to the bottom or our address like we would in MIPS. This is because our processor is 16-bit WORD addressed, so every possible address holds a valid 16 bit instruction.

        PC = (PC & 0xF000) | target address 
        

Note that the PC being used on the right side of the address is the PC of jump instruction itself. This is also a departure from MIPS, which uses the address of the jump, plus 4.

Note that you should kill the next instruction after a jump or jal even if that is the instruction you are going to be jumping to.

On a jal the address of the next instruction should be written into $r3. This is what we mean by "link into $r3".

beq/bne

The beq instruction's argument is a signed offset relative to the next instruction to be executed if we don't take the branch, which is similar to MIPS. Note that the address of this next instruction is (PC+1) rather than (PC+4) because our processor is 16-bit WORD addressed. Here, the "current PC"means the address of the beq or bne instruction. beq can be represented as the following:

        if $rs == $rt
                PC = PC + 1 + offset
        else
                increment PC like normal
        

Again, the PC being used on the right side of the equals sign is the address of the branch itself.

The bne instruction differs only by the test in the if statement: replace the == with != Note that you should not kill the next instruction if the branch is not taken. If the branch is taken you should always kill.

immediate Fields

Note that the immediate field is only 8 bits wide, so we must perform some kind of extension on it before passing it to the ALU. If an immediate is supposeed to be unsigned, be sure to zero-extend it. If an immediate is signed, be sure to sign-extend it.


Logisim

It is strongly recommended that you download and run Logisim on your local machine while developing your processor. As you've probably discovered in lab, Logisim can quickly overwhelm the instructional machines. Though Logisim is relatively stable compared to prior semesters, it is still recommended that you save often and also make backup copies of your .circ files early and often. The official version of Logisim we will be using for evaluation is v2.7.1.

If you are having trouble with Logisim: RESTART IT and RELOAD your circuit! Don't waste your time chasing a bug that is not your fault. However, if restarting doesn't solve the problem, it is more likely that the bug is a flaw in your project. Please post to the newsgroup about any crazy bugs that you find and we will investigate.

Do NOT copy and paste from different Logisim windows.

Logisim has been known to have trouble with this in the past. So try to avoid it.

RAM Modules

Logisim RAM modules can be found in the built-in memory library. To add the library to your project, select "Project/Load Library/Built-in Library..." and select the Memory module.

Because the RAM module doesn't look like the idealized memory we saw in lecture, you may feel confused about where to begin. The picture above shows a good way to wire up a circuit to use RAM. Here are a few things to know before you get started.

  • "sel" determines whether or not the RAM module is active. We will probably not run into any cases where we need to turn our RAM off, so you can wire a constant 1 to this.
  • "A" chooses which address will be accessed.
  • The clock input provides synchronization for memory writes. Be sure to use the same clock here as you do for your reg file.
  • "ld" determines whether we are reading or writing to memory. If "out" is high, then "D" will be driven with the contents of memory at address "A".
  • "clr" will instantly set all contents of memory to 0 if high. You should wire a manual switch so you can clear out memory whenever you want to restart a test.
  • "D" acts as both data in and data out for this module. This means you have to be careful not to drive this line from two conflicting sources, which in this case are DataIn and the output of the memory. You can solve this by using a controlled buffer (aka a tri-state buffer) on the "D" port of the RAM module. By wiring logic to the "out" port and the valve port of the controlled buffer together so that they are always opposite values (as in the picture above), we can prevent conflicts between data being driven in and the contents of memory coming out.
  • The "poke" tool can be used to modify the contents of the memory. You can also use right-click --> Load Image... to load an image from a file.
The best way to learn how these work is simply to play with them. You can also refer to Logisim documentation on RAM modules here.

Use a RAM module for Data memory. Your instruction memory is a ROM in the harness.

Hint: you might want to connect the wire attached to the the D port of the RAM module to your processor's "Data Mem Data" output, the one attached to the A port to your "Data Mem Address" output, and the MemWrite control wire to your "Data Mem Write" output.

The Display Instruction and Overflow LEDs

Remember that the five components of a computer are control, datapath, memory, input, and output devices. Unfortunately, we won't have much input besides loading to memory. But we will have a cool output device using 7-segment LEDS.

We provide for you a converter circuit (right click and save as, also at ~cs61c/proj4StartKit/Converter.circ) that converts 16 bits into 4 hex digits, which are displayed on a "bundle" of four 7-segment LEDs. Documentation for the converter is included inside Converter.circ. Just open it in Logism. Each bundle should look like the following:

Alternatively, you may use the new hex displays in Logisim.

Your project must include an array of at least two of these display "bundles" for output, in the top level of cpu.circ. You may wish to add more so you can have more interesting output, but we will only require two. Remember that the disp instruction takes the value in $rs and displays it on the immth "bundle". This means you will only care about as many immediate values as you have display bundles to show them on. You should connect the values being displayed on the first two display bundles to the d0 and d1 outputs of your processor.

Your bundles must display 0000 before any disp instructions have been executed.

They must hold their values until another disp instruction replaces the value in that bundle.

For example:

    andi $r0, $r0, 0x0000
    disp $r0, 0   # After this instruction, DISP[0] should show 0000

    ori $r0, $r0, 0x1   # DISP[0] should still show 0000
    add $r0, $r0, $r0   # DISP[0] should stlll show 0000

    disp $r0, 0   # DISP[0] should now show 0002
    
This means you will need to add some form of state for each display bundle. The value on the display should update at the rising edge of the clock cycle.

You also need to have an LED unit which lights up to signify signed overflow. This indicator should be wired to the signed overflow port of your ALU. They should be viewable in your main circuit.


Testing

Once you've implemented your processor, you can test its correctness by writing programs to run on it! First, try this simple program as a sanity check: halt.s. This program loads the same immediate into two different registers using lui/ori and then branches -1 if these registers are equal.

         lui $r0, 0x33          2033
         ori $r0, $r0, 0x44     3044
         lui $r1, 0x33          2133
         ori $r1, $r1, 0x44     3544
         self: beq $r0, $r1, self       91FF
        
For practice, verify that the assembly on the left matches the translated binary on the right. This program effectively "halts" the processor by putting it into an infinite loop, so you can observe the outputs as well as memory and register state. Of course, you could do this "halt" with only the beq line, but it is very important that you test your lui/ori or the programs we will use during grading will not work.

To test your processor, open the cpu-harness.circ. Right click on the ROM that is the instruction memory, and click "load image". Select the assembled program to load it into the instruction memory, and start clock ticks.

You are required to write 2 sample programs to test your processor, but you should also write others to test all your instructions. Remember: Debugging Sucks. Testing Rocks.

  • Write a program that multiplies the first two 16-bit words of memory (MEM[0] and MEM[1]) and stores their product in MEM[2]. You may treat MEM[0] and MEM[1] as unsigned values. Also, do not worry about the case where the product does not fit in 16 bits. The last instruction of your program must be a halt (an instruction that jumps or branches to itself indefinitely). Feel free to clobber the original arguments. Save the assembler source in a file "mult.s."
  • Write a program that displays the lower nine bits of the first 16-bit word of memory in octal (base 8). For example, if the first 16-bit word were 0x829f, the seven segment displays would read 0237. Again, you may clobber any memory values you like and your program must end with a halt. Save the assembler source in a file "octal.s".

We are working on providing an autograder with a strict subset of the tests that we will run on your processor. We will announce when this is available. When these are made available, passing all of our tests is not a guarantee that your processor is bug-free.


Assembler

We've provided a simple assembler to make writing your programs easier. You should try writing a few by hand before using this, mainly because it's good practice and makes you feel cooler.

The assembler can be downloaded here. This assembler is a work in progress, so please report bugs to the google group!

The assembler takes files of the following form:

lui $r0, 85 
ori $r0, $r0, 68
lui $r1, 85
ori $r1, $r1, 68  
self: beq $r0, $r1, self

Anywhere a register is required, it must be either $r0, $r1, $r2 or $r3. Commas are optional but the $ is not. # starts a comment. The assembler can be invoked with python assembler.py input.s [-o output.hex] (output file is input.hex if not explicitly set)


Miscellaneous remarks

This section contains some comments to help you get started and foster some academic debate. Feel free to skip this section for now and come back after you've gotten your hands dirty.

Questions to Think About

1) What instructions (and with what arguments) can act as nops?
2) What are the different ways of halting a program?

Logisim's Combinational Analysis Feature

Logisim offers some functionality for automating circuit implementation given a truth table, or vice versa. Though not disallowed (enforcing such a requirement is impractical), use of this feature is discouraged. Remember that you will not be allowed to have a laptop running Logisim on the final.

Key Differences From MIPS

1) The zero register isn't special. $r0 is just a regular register like $r1.
2) Memory is addressed every 16 bits, or a WORD in our 16 bit architecture. That means each location in memory holds a 16 bit value, unlike MIPS where each location holds 8 bits.
3) There are only 4 registers, instead of 32.
4) Data memory and instruction memory are separate. Remember that in MIPS, we create the illusion of separate memory with two caches, but we really have only one memory.
5) Branch delay slots are not exposed to software. You have to deal with them in hardware.
6) jr is not a r-type instruction. This is because if it was r-type, not all r-types would map directly to an ALU operation. This is purely for your benefit.

Miscellaneous Requirements

You must implement a two-cycle design, with the stages of instruction fetch and execute. You must solve control hazards by killing the instruction after the branch, and you must kill the instruction after the taken branch or jump. If the branch is not taken you must not kill it (do not kill untaken branches).

Do not gate the clock! This is very bad design practice when making real circuits, so we will discourage you from doing this by heavily penalizing your project if you gate your clock.

You may use any built-in logisim library circuit components in implementing your processor.

You must include an array of at least two seven segment display bundles in the main processor circuit (cpu.circ). Anything else can go in subcircuits. Feel free to add additional seven segment displays if it does not clutter your processor.

Use the label tool to organize your processor. In particular label the control, datapath, and display sections, but it can also be useful to label specific busses and wires. It could make debugging a lot easier!

Plan

If you are feeling overwhelmed by this spec, these steps may help you get started. These are only suggestions, and you can choose to make any adjustments you prefer. For example, some of you may want to design the control before the datapath. Just remember to follow the rule above about keeping both the data memory and the display bundles at the top level main circuit.

  1. You should familiarize yourself with how Logisim's RAM modules work. Follow the guidelines in the section "RAM Modules" above.
  2. If you haven't already, you should familiarize yourself with how Logisim's Registers work. Once you are comfortable with this, you should try to build the register file.
  3. Design your ALU.
  4. Submit your ALU and register file.
  5. Think about any other components you will need in your datapath. Build these, and then layout everything. Wire these components together so that this datapath can execute every instruction our ISA supports. Be sure to identify any control signals you will need to generate.
  6. Design your control. Remember the process we discussed in class.
  7. Connect the control to the datapath.
  8. Test instructions individually, then try whole programs.
  9. Finish writing "mult.hex" and "octal.hex".
  10. Submit your processor and programs
  11. Rejoice! and then sleep.

Extra for Experts

Currently, these extras are not for credit...

Once you've got your processor up and running, give these a try: (BACKUP your completed processor first)

  • Add more instructions, for instance the SLTI instruction.
  • Implement SIMD ADD and SIMD SUB instructions that treat registers as two 8-bit values.
  • Add more features to the assembler (and send it to us so we can use it!)

Submission

You must submit the following files for part 1:
alu.circ
Regfile.circ

From the directory containing your project files, submit using the following command:
% submit proj4-1

You must submit the following files for part 2:
cpu.circ
mult.s
octal.s

From the directory containing your project files, submit using the following command:
% submit proj4-2


You must also submit any .circ files that you use in your solution (they are not copied into your main cpu.circ file when you import them, only referenced). Make sure you submit every .circ file that is part of your project! You might want to test your .circ file on the lab machines before you submit it, to make sure you got everything.


We will be providing our own versions of the *-harness.circ files, so you do not need to submit those. In addition, please do not depend on any changes you make to those files.


Remember that if submit fails, your assignment has not been submitted. If submit does not ask you to confirm submitting a particular file, that file has not been submitted. If we cannot make sense of your submission, there is nothing we can do. Sorry.