CS61C Summer 2013
Due Sunday, August 11th, 2013 at 11:59 PM
TA in charge: Justin Fu
Based on original spec by Ben Sussman and Brian Zimmer, and modified
spec of Albert Chae, Paul Pearce, Noah Johnson, Justin Hsia, Conor
Hughes, Anirudh Todi, Ian Vonseggern, and Sung Roa Yoon.
Much thanks to Conor Hughes for an excellent assembler and autograder.
Post any questions or comments to Piazza.
This project spec is ridiculously long, but don't fret!
We've spelled out many things in excruciating detail, so if you just take things one-by-one, it won't be as bad as it looks.
We are also providing a set of abridged project notes
to look at. These will NOT substitute for reading through the actual
project specs, but can be used as a quick reference later on.
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission
Updates and Clarifications
- 8/8: Extra Credit: cache.circ and cache.harness had an error. Re-copy the new files if you haven't started on the cache yet. If you have, change the width of "AddressToMemory" in cache.circ to 16 (was 8)
- 8/6: Extra credit released. Changed naming of ALU arguments to reduce confusion (from $rs/$rt/$rd/imm to result/X/Y)
- 8/2: Project release
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission
Overview
- MAKE SURE TO CHECK YOUR CIRCUITS WITH THE GIVEN HARNESSES TO SEE IF THEY FIT! YOU WILL FAIL ALL OUR TESTS IF THEY DO NOT.
(This also means that you should not be moving around given inputs and outputs in the circuits).
- This is an INDIVIDUAL project.
- Tarball of sample tests for a completed CPU have been included in
the start kit. Look at the README file for usage info. I recommend
running the sample tests locally. These tests are NOT comprehensive, you
will need to do further testing on your own.
- You are allowed to use any of Logisim's built-in blocks for all parts of this project.
- Save often. Logism can be buggy and the last thing you want is to lose some of your hard work. There are students every semester who has to start over large chunks of their projects this way.
In this project you will be using Logisim
to create a 16-bit two-cycle processor.
It is similar to MIPS, except that both the datapath and the
instructions are 16-bits wide, it has only 4 registers, and memory
addresses represent 16-bit words instead of 8-bit bytes (word-addressed
instead of byte-addressed).
Please read this document CAREFULLY as there are key differences
between the processor we studied in class and the processor you will be
designing for this project.
Before you begin, copy the start kit to your home directory (and then possibly to your own machine):
$ cp -r ~cs61c/proj/su13_proj3 proj03
Pipelining
Your processor will have a 2-stage pipeline:
- Instruction Fetch: An instruction is fetched from the instruction memory.
- Execute: The instruction is decoded, executed, and
committed (written back). This is a combination of the remaining stages
of a normal MIPS pipeline.
You should note that data hazards do NOT pose a problem for this design,
since all accesses to all sources of data happens only in a single
pipeline stage.
However, there are still control hazards to deal with.
Our ISA does not expose branch delay slots to software.
This means that the instruction immediately after a branch or jump is
not necessarily executed if the branch is taken.
This makes your task a bit more complex.
By the time you have figured out that a branch or jump is in the execute
stage, you have already accessed the instruction memory and pulled out
(possibly) the wrong instruction.
You will therefore need to "kill" instructions that are being fetched if
the instruction under execution is a jump or a taken branch.
Instruction kills for this project MUST be accomplished by MUXing a nop into the instruction stream and sending the nop into the Execute stage instead of using the fetched instruction.
Notice that 0x0000 is a nop instruction; please use this, as it will simplify grading and testing.
You should only kill if a branch is taken (do not kill otherwise), but do kill on every type of jump.
Because all of the control and execution is handled in the Execute stage, your
processor should be more or less indistinguishable from a single-cycle
implementation, barring the one-cycle startup latency and the
branch/jump delays.
However, we will be enforcing the two-pipeline design.
If you are unsure about pipelining, it is perfectly fine (maybe even
recommended) to first implement a single-cycle processor.
This will allow you to first verify that your instruction decoding,
control signals, arithmetic operations, and memory accesses are all
working properly.
From a single-cycle processor you can then split off the Instruction
Fetch stage with a few additions and a few logical tweaks.
Some things to consider:
- Will the IF and EX stages have the same or different PC values?
- Do you need to store the PC between the pipelining stages?
- To MUX a nop into the instruction stream, do you place it before or after the instruction register?
- What address should be requested next while the EX stage executes a nop? Is this different than normal?
You might also notice a bootstrapping problem here: during the first
cycle, the instruction register sitting between the pipeline stages
won't contain an instruction loaded from memory.
How do we deal with this?
It happens that Logisim automatically sets registers to zero on reset;
the instruction register will then contain a nop.
Remember to go to Simulate --> Reset Simulation (Ctrl+R) to reset your processor.
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission
Deliverables
Approach this project like you would any coding assignment: construct it piece by piece and test each component early and often!
Tidyness and readability will be a large factor in grading your circuit if there are any issues, so please make it as neat as possible! If we can't comprehend your circuit, you will probably receive no partial credit.
You will design a register file to manage the four 16-bit registers in
our ISA.
After being told to write data to a particular register, you will be
able to retrieve that data by asking for the value of that register on
subsequent clock cycles.
We are NOT giving special treatment to the zero register.
That is, you are allowed to write to $r0.
You are provided with the skeleton of a register file in Regfile.circ.
The register file circuit has the following inputs:
Input Name
| Bit Width
| Description
|
CLK
| 1
| Input for the clock. This can be sent into subcircuits or
attached directly to the clock inputs of memory units in Logisim, but
should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).
|
RegWrite
| 1
| Determines whether data is written on the next rising edge of the clock.
|
SetReg
| 1
| Determines whether WriteData is written to all registers on the next rising edge of the clock.
RegWrite must also be on for this write to happen.
|
Read Register 1
| 2
| Determines which register's value is sent to the Read Data 1 output, see below.
|
Read Register 2
| 2
| Determines which register's value is sent to the Read Data 2 output, see below.
|
Write Register
| 2
| Determines which register to set to Write Data on the next rising edge of the clock, assuming that RegWrite is asserted.
|
Write Data
| 16
| Determines what data to write to the register identified by
the Write Register input on the next rising edge of the clock, assuming
that RegWrite is asserted.
|
The register file also has the following six outputs:
Output Name
| Bit Width
| Description
|
Reg 0 Value
| 16
| Always driven with the value of register 0. This is
primarily for grading & debugging; if you were really designing a
register file you would probably omit this output.
|
Reg 1 Value
| 16
| Always driven with the value of register 1. This is
primarily for grading & debugging; if you were really designing a
register file you would probably omit this output.
|
Reg 2 Value
| 16
| Always driven with the value of register 2. This is
primarily for grading & debugging; if you were really designing a
register file you would probably omit this output.
|
Reg 3 Value
| 16
| Always driven with the value of register 3. This is
primarily for grading & debugging; if you were really designing a
register file you would probably omit this output.
|
Read Data 1
| 16 | Driven with the value of the register identified by the Read Register 1 input.
|
Read Data 2
| 16
| Driven with the value of the register identified by the Read Register 2 input.
|
You can make any modifications to Regfile.circ you want, but the outputs must obey the behavior specified above.
In addition, your Regfile.circ that you submit must fit into the Regfile-harness.circ
file we have provided for you.
This means that you should take care to not reorder inputs or outputs,
though you can move them around if you need more space or something.
A circuit like Regfile-harness.circ will be used to test your register file for grading.
You should download a fresh copy of Regfile-harness.circ and make sure your Regfile.circ is cleanly loaded before submitting.
You will also design an ALU that your processor will use to do math.
You will tell your ALU what operation to perform and it will drive its output with the result of that operation.
You ARE allowed to use all of Logisim's built-in arithmetic blocks, including adder, subtractor, and shifter.
Alternatively, feel free to use any sub-circuit that you created previously for homework or lab.
We have provided a skeleton of an ALU for you in alu.circ. It has three inputs:
Input Name
| Bit Width
| Description
|
X
| 16
| Data to use for X in the ALU operation.
|
Y
| 16
| Data to use for Y in the ALU operation.
|
Switch (S)
| 4
| Selects what operation the ALU should perform (see below).
|
The ALU also has three outputs:
Output Name
| Bit Width
| Description
|
Signed Overflow
| 1
| High iff the operation was an add, sub, addh8, subh8, addp8, or subp8, and there was signed overflow.
|
Result
| 16
| Result of the ALU operation.
|
Equal
| 1
| High iff the two inputs X and Y are equal.
|
The operation you should perform is given by the following table:
switch |
Instruction |
0
| or: result = X | Y
|
1
| and: result = X & Y
|
2
| addp8: result = {X[15:8] + Y[15:8] , X[7:0] + Y[7:0] }
|
3
| subp8: result = {X[15:8] - Y[15:8] , X[7:0] - Y[7:0] }
|
4
| addh8: result = {X[15:8] + X[7:0] , Y[15:8] + Y[7:0] }
|
5
| subh8: result = {X[15:8] - X[7:0] , Y[15:8] - Y[7:0] }
|
6
| sllv: result = X << Y
|
7
| srlv: result = X >> Y (zero extend)
|
8
| srav: result = X >> Y (sign extend)
|
9
| add: result = X + Y
|
10
| sub: result = X - Y
|
11
| slt: result = (X < Y) ? 1 : 0
|
12
|
shuff: result[15:8] = Y[1]==1? X[15:8] : X[7:0]
result[7:0] = Y[0]==1? X[15:8] : X[7:0]
|
Note: You can assume for shift operations that Y will be non-negative and less than 16.
Follow the same instructions as the register file regarding rearranging inputs and outputs of the ALU.
In particular, you should ensure that your ALU is correctly loaded by a fresh copy of alu-harness.circ before you submit.
We have provided a skeleton for your processor in cpu.circ along with a testing harness in cpu-harness.circ.
Your completed processor should implement the ISA detailed below in the section Instruction Set Architecture (ISA)
using a two-cycle pipeline.
Your processor will contain an instance of both your ALU and your
Register File.
In addition, you will be constructing the Data Memory and Outputs
(Deliverables 3a and 3b below) as well as the entire datapath and
control from scratch.
It will interact with our harness through 2 inputs and 10 outputs.
Your processor will get its program from the processor harness we have
provided.
It will send the address of instruction memory it wants to access to the
harness through an output, and accept the instruction at that address
as an input.
Inspect cpu-harness.circ to see exactly what's going on.
Your processor has 2 inputs that come from the harness.
Input Name
| Bit Width
| Description
|
From Instr Mem
| 16
| Driven with the instruction at the instruction memory address identified by the "To Instr Mem" output (see below).
|
CLK
| 1
| The input for the clock. As with the register file, this
can be sent into subcircuits (e.g. the CLK input for your register file)
or attached directly to the clock inputs of memory units in Logisim,
but should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).
|
Your processor must provide 10 outputs to the harness:
Output Name
| Bit Width
| Description
|
R0
| 16
| Driven with the contents of register 0.
|
R1
| 16
| Driven with the contents of register 1.
|
R2
| 16
| Driven with the contents of register 2.
|
R3
| 16
| Driven with the contents of register 3.
|
D0
| 16
| Driven with the value being displayed on the first display.
|
D1
| 16
| Driven with the value being displayed on the second display.
|
To Instr Mem
| 16
| This output is used to select which instruction is presented to the processor on the "From Instr Mem" input.
|
Data Mem Data
| 16
| Driven with the data being written to memory. When no data
is being written to memory, this can be driven with whatever you want.
|
Data Mem Addr
| 16
| Driven with the address being written in memory. When no
data is being written to memory, this can be driven with whatever you
want.
|
Data Mem Write
| 1
| High when data is going to be written to memory. Low otherwise.
|
Follow the same instructions as the register file and ALU regarding rearranging inputs and outputs of the processor.
In particular, you should ensure that your processor is correctly loaded by a fresh copy of cpu-harness.circ before you submit.
You will build your Data Memory on your own using a RAM module.
Note that this is different than a ROM module.
Logisim RAM modules can be found in the built-in Memory library/folder.
You Data Memory should be located in the top level of cpu.circ
and connected to the appropriate processor outputs.
It will not be tested separately in a harness file like the previous
three deliverables, but obviously correct function is required for your
processor to work.
For those unfamiliar with the RAM module, the pictures above show a good
way to wire up a circuit to use RAM.
You are not required to implement Data Memory as shown above and you can
use a memory with separate read and write ports if you should so
desire.
Here are a few things to know about the RAM module before you get started:
- "clk" provides synchronization for memory writes. Be sure to use the same clock here as you do for your Register File.
- "sel" determines whether or not the RAM module is active. We will
probably not run into any cases where we need to turn our RAM off, so
you can wire a constant 1 to this.
- "A" chooses which address will be accessed.
- "clr" will instantly set all contents of memory to 0 if high. You should wire a manual switch (Input/Output --> Button) so you can clear out memory whenever you want to restart a test.
- "ld" determines whether we are reading or writing to
memory. If "ld" is high, then "D" will be driven with the contents of
memory at address "A" (left image). If "ld" is low, then the contents of
"D" will be stored in memory at address "A" (right image).
- "D" acts as both data in and data out for this module. This means
you have to be careful not to drive this line from two conflicting
sources, which in this case are DataIn and the output of the memory. You
can solve this by using a controlled buffer (a.k.a. a tri-state buffer)
on the "D" port of the RAM module. By wiring logic to the "ld" port and
the valve
port of the controlled buffer together so that they are always
opposite values (as in the pictures above), we can prevent conflicts
between data being driven in and the contents of memory coming out.
- The "poke" tool can be used to modify the contents of the memory. You can also use right-click --> Load Image... to load an image from a file.
The best way to learn how these work is simply to play with them.
You can also refer to Logisim documentation on RAM modules here.
Remember that the five components of a computer are control, datapath, memory, input, and output devices.
Unfortunately, we won't have much input besides loading to memory.
But we will have a cool output device using hex digit displays in Logisim. Each bundle should look like this:
You should use the new hex digit displays in Logisim (this will be much easier than trying to use the 7 segment ones).
Your project must include an array of at least two of these
display "bundles" (since our data is 16 bits and a hex digit is 4
bits you will need 4 digits per bundle) for output, in the top level of cpu.circ.
You may wish to add more so you
can have more interesting output, but we will only require two.
Remember that the disp instruction takes the value in
$rs and displays it on the immth "bundle". This means
you will only care about as many immediate values as you have
display bundles to show them on. You should connect the values being
displayed on the first two display bundles to the d0 and d1 outputs of
your processor.
Your bundles must display 0000 before any
disp instructions have been executed.
They must hold their values until another disp instruction replaces
the value in that bundle.
For example:
andi $r0, $r0, 0x0000
disp $r0, 0 # After this instruction, DISP[0] should show 0000
ori $r0, $r0, 0x1 # DISP[0] should still show 0000
add $r0, $r0, $r0 # DISP[0] should stlll show 0000
disp $r0, 0 # DISP[0] should now show 0002
This means you
will need to add some form of state for each
display bundle. The value on the display should update at the rising
edge of the clock cycle.
You also need to have an LED unit which lights up to signify signed overflow.
This indicator should be wired to the signed overflow port of your ALU.
This should be viewable in your main circuit.
Since you are building a processor, you can run actual programs on it!
There are more details about testing your processor and the provided assembler in the Testing section, but in particular you will be REQUIRED to write and submit the following two programs:
-
Write a program that utilizes the SIMD add instruction you created to calculate A*C and B*C where A is MEM[0],
B is MEM[1], and C is MEM[2]. Then store the results A*C into the upper 8 bits of $r2 and B*C into the lower 8 bits of $r2.
Assume they are unsigned values and in addition do not worry about products that do
not fit in 8 bits (so this only does small calculations, multiplications that result in a number less than 256).
Your function must be labeled: "multsimd:". Feel free to clobber any memory values. At the end of the function, jump to the caller by doing jr $r3.
Save this file as multsimd.s.
- Write a program that displays the lower nine bits of the first 16-bit word of memory in octal (base 8) on Display Bundle 0.
For example, if the first 16-bit word were 0x829f, the hex digit displays would read 0237.
Your function must be labeled: "OCTAL:". Save the assembler source in a file octal.s.
Remember, at the end of each function, you must jump back to the caller by doing jr $r3
(so make sure $r3 has the correct address value before trying this line!).
Write these functions as you would a normal MIPS function, remembering to stay within your processor's ISA.
You cannot assume anything about the values in the registers when your
function is called and remember that only the first three registers can be freely
changed.
It is recommended that you write your own main functions to set
up Data Memory and use the Display Bundles as desired while writing and
testing your code, though your submitted code should contain ONLY the
functions themselves.
We will test your code by appending it to our own main functions for different test cases, assembling using assembler.py, and then running it on both your submitted processor and a known working processor.
This is why it is important that you include only your function and not other testing statements.
Our main functions will all end in a halt (an instruction that jumps or branches to itself indefinitely - see Testing for details) in order to avoid executing your code an extra time.
Feel free to set something similar up when testing your own code.
We will be giving extra credit for implementing a cache. This assignment will be separate from the main project, with its own harness and test cases. We will NOT be testing the cache inside your CPU. It will be worth approximately 10 points, or around 10% of the total points for this project.
The cache has the following specs:
- Direct-mapped
- Write-through
- Write-allocate
- 4 slots/lines
- 2 words per cache line (so there are 32 bits per line)
Your cache block should be functionally equivalent to a RAM block. The "Address" port on the cache is analagous to the "Address" input port on RAM. The cache's "DataIn" port is analagous to the RAM's "Input" port (for a RAM block with separate load and store ports), and the cache's "DataOut" port is analagous to the RAM's "Output" port.
Lecture 19 (Functional Units, Finite State Machines) has a few hints in the bonus slides on how to implement a cache.
You can copy the skeleton and harness into your current directory with the following command:
$ cp ~cs61c/proj/su13_proj3-ec/* .
The skeleton contains the following inputs:
Input Name
| Bit Width
| Description
|
CLK
| 1
| Input for the clock. This can be sent into subcircuits or
attached directly to the clock inputs of memory units in Logisim, but
should not otherwise be gated (i.e., do not invert it, do not and it with anything, etc.).
|
MemWrite
| 1
| Determines whether data is written on the next rising edge of the clock.
|
Address
| 16
| The address of data to be stored or loaded from the cache. This is analagous to the "A" port on the RAM block.
|
DataIn
| 16
| Data value to be stored into the cache on the next rising edge of the clock.
|
DataFromMem
| 32
| Contains 32-bit value stored in memory at the address indicated by AddressToMemory.
|
The following are outputs from your circuit
Output Name
| Bit Width
| Description
|
DataOut
| 16
| The value of data in memory at "Address". This is the value returned to the CPU on a load.
This port must work asynchronously; in other words, it updates whenever the inputs change and does not depend on the clock.
|
AddressToMemory
| 16
| The address to load from / store to memory. Memory is double-word addressed (32-bit, which is different from the main project which
is just word addressed). Ex. Mem[0] contains the first two 16-bit words, and Mem[1] contains the third and fourth 16-bit words.
|
DataToMemory
| 32
| Data to be stored into memory. Remember that your cache is write-through, so you must store each write into memory.
|
Line0
| 32
| Driven by the values stored in the cache line/slot/row corresponding to set 0 (index of address = 0). The upper 16 bits hold data corresponding to an offset of 1,
and the lower 16 bits hold data corresponding to an offset of 0.
|
Line1
| 32
| Driven by the values stored in the cache line/slot/row corresponding to set 1 (index of address = 1). The upper 16 bits hold data corresponding to an offset of 1,
and the lower 16 bits hold data corresponding to an offset of 0.
|
Line2
| 32
| Driven by the values stored in the cache line/slot/row corresponding to set 2 (index of address = 2). The upper 16 bits hold data corresponding to an offset of 1,
and the lower 16 bits hold data corresponding to an offset of 0.
|
Line3
| 32
| Driven by the values stored in the cache line/slot/row corresponding to set 3 (index of address = 3). The upper 16 bits hold data corresponding to an offset of 1,
and the lower 16 bits hold data corresponding to an offset of 0.
|
Valid+Tag0
| 14
| Driven by the management bits for the cache line/slot/row corresponding to set 0. The uppermost bit contains the valid bit, and the lower 13 bits hold the tag.
|
Valid+Tag1
| 14
| Driven by the management bits for the cache line/slot/row corresponding to set 1. The uppermost bit contains the valid bit, and the lower 13 bits hold the tag.
|
Valid+Tag2
| 14
| Driven by the management bits for the cache line/slot/row corresponding to set 2. The uppermost bit contains the valid bit, and the lower 13 bits hold the tag.
|
Valid+Tag3
| 14
| Driven by the management bits for the cache line/slot/row corresponding to set 3. The uppermost bit contains the valid bit, and the lower 13 bits hold the tag.
|
Submit using the command:
$ submit proj3-ec
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission
Instruction Set Architecture (ISA)
You will be implementing a simple 16-bit processor with four registers ($r0-$r3).
It will have separate data and instruction memory.
Because this is a 16-bit architecture, our words are 16 bits wide, unlike the 32-bit MIPS ISA we have been studying in class.
For the remainder of this document, a WORD refers to 16 bits.
Each of the four registers is big enough to hold ONE word.
IMPORTANT: Because of the limitations of Logisim (and to make things simpler), our memories will be word-addressed (16 bits), unlike MIPS, which is byte-addressed (8 bits).
The instruction encoding is given below.
Your processor will pull out a 16-bit value from instruction memory and
determine the meaning of that instruction by looking at the opcode (the top four bits, which are bits 15-12).
If the instruction is an R-type (i.e. opcode == 0), then you must also look at the funct field.
Notice how we do not use all 64 R-type instructions.
Your project only has to work on these specified instructions.
This way the project is shorter and easier.
15-12 |
11 |
10 |
9 |
8 |
7 |
6 |
5 |
4 |
3 |
2 |
1 |
0 |
0 |
rs
| rt
| rd
| funct
| See R-type Instructions
|
1 |
rs
| rt
| offset (signed)
| bne: branch if not equal |
2 |
rs
| rt
| offset (signed)
| beq: branch if equal |
3 |
target address
| j: jump to target address |
4 |
rs
| unused
| jr: PC = $rs |
5 |
target address
| jal into $r3 |
6 |
rs
| rt
| immediate (signed)
| addi: $rt = $rs + imm
|
7 |
rs
| rt
| immediate (unsigned)
| andi: $rt = $rs & imm |
8 |
rs
| rt
| immediate (unsigned)
| ori: $rt = $rs | imm |
9 |
rs
| rt
| immediate (signed)
| sw: MEM[$rs+imm] = $rt |
10 |
rs
| rt
| immediate (signed)
| lw: $rt = MEM[$rs + imm] |
11 |
rs
| rt
| immediate (unsigned)
| lui: $rt = imm << 8 |
12 |
rs
| rt
| immediate (unsigned)
| disp: DISP[imm] = $rs |
13 |
unused
| unused
| immediate (signed)
| setreg: R[0] = R[1] = R[2] = R[3] = imm |
14 |
rs
| rt
| immediate (unsigned)
| shuff: $rt[15:8] = imm[1]==1? $rs[15:8] : $rs[7:0]
$rt[7:0] = imm[0]==1? $rs[15:8] : $rs[7:0]
|
R-Type Instructions
funct |
Instruction |
0
| or: $rd = $rs | $rt
|
1
| and: $rd = $rs & $rt
|
2
| addp8: $rd = {$rs[15:8] + $rt[15:8] , $rs[7:0] + $rt[7:0] }
|
3
| subp8: $rd = {$rs[15:8] - $rt[15:8] , $rs[7:0] - $rt[7:0] }
|
4
| addh8: $rd = {$rs[15:8] + $rs[7:0] , $rt[15:8] + $rt[7:0] }
|
5
| subh8: $rd = {$rs[15:8] - $rs[7:0] , $rt[15:8] - $rt[7:0] }
|
6
| sllv: $rd = $rs << $rt
|
7
| srlv: $rd = $rs >> $rt (zero extend)
|
8
| srav: $rd = $rs >> $rt (sign extend)
|
9
| add: $rd = $rs + $rt
|
10
| sub: $rd = $rs - $rt
|
11
| slt: $rd = ($rs < $rt) ? 1 : 0
|
Some specifics on selected instructions:
Shifting
- We will not test shift amounts greater than 15; behavior in this case is undefined.
Jumping
- The argument to the jump and jal instructions is a pseudoabsolute address, similar to MIPS.
The target address is an unsigned number representing the lower 12 bits of the next instruction to be executed.
The upper four bits are taken from the current PC.
We do NOT concatenate any zeroes to the bottom of our address like we would in MIPS.
This is because our processor is word-addressed, so every possible address holds a valid 16-bit instruction:
nextPC = ("currPC" & 0xF000) | target address
Note that "currPC" is the PC of the jump instruction itself. We will be assuming small program sizes (< 210 instructions) and will not be strictly enforcing currPC+1 as it would be in MIPS.
Note that you should kill the next instruction after a jump or jal even if that is the instruction you are going to be jumping to.
On a jal the address of the next instruction should be written into $r3. This is what we mean by "link into $r3".
Branching
- The argument to the beq and bne instructions is a signed offset relative to the next instruction to be executed if we don't take the branch, which is similar to MIPS.
Note that the address of this next instruction is PC+1 rather than PC+4 because our processor is word-addressed.
Here, currPC means the address of the branch instruction.
We can write beq as the following:
if $rs == $rt
nextPC = currPC+1 + offset
else
increment PC like normal
Think! There's a reason we write "increment PC like normal" here instead of just "currPC+1".
The bne instruction differs only by the test in the if statement: replace the == with !=.
Note that you should not kill the next instruction if the branch is not taken. If the branch is taken you should always kill.
Immediates
- Note that the immediate field is only 8 bits wide, so we must perform some kind of extension on it before passing it to the ALU.
If an immediate is supposeed to be unsigned, be sure to zero-extend it.
If an immediate is signed, be sure to sign-extend it.
Functions not in MIPS Greensheet
You may have noticed that there are a few functions not defined in the Greensheet, such as addp8, subp8, and shuff. We've all seen these types of SIMD instructions in different architectures (if you haven't, I'd be curious to see just how much fun you had with proj2), and we thought it might be fun for you guys to try implementing them!
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission
Logisim Notes
Section has been moved to Logisim Notes
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission
Testing
Once you've implemented your processor, you can test its correctness by writing programs to run on it!
First, try this simple program as a sanity check: halt.s.
This program loads the same immediate into two different registers using lui/ori and then branches back one instruction (offset = -1) if these registers are equal.
Assembly: Binary:
======== ======
lui $r0, 0x33 B033
ori $r0, $r0, 0x44 8044
lui $r1, 0x33 B133
ori $r1, $r1, 0x44 8544
self: beq $r0, $r1, self 21FF
For practice, verify that the assembly on the left matches the
translated binary on the right.
This program effectively "halts" the processor by putting it into an
infinite loop, so you can observe the outputs as well as memory and
register state.
Of course, you could do this "halt" with only the beq line, but it is very important that you test your lui/ori or the programs we will use during grading will not work.
To test your processor, open the cpu-harness.circ.
Find the Instruction Memory RAM and right click --> Load Image...
Select the assembled program (.hex file - see details on the Assembler below) to load it and then start clock ticks.
As described in the Deliverables, you are REQUIRED to write and submit two sample programs to test your processor (octal.s and multsimd.s), but you should also write others to test all your instructions.
Remember: Debugging Sucks. Testing Rocks.
Assembler
We've provided a basic assembler to make writing your programs easier so you can use assembly instead of machine code.
You should try writing a few by hand before using this, mainly because it's good practice and makes you feel cooler.
This assembler.py supports all of the instructions for your processor.
The assembler is included in the start kit (one you pull from the repo with earlier instruction) or can be downloaded from the link above.
The standard assembler is a work in progress, so please report bugs to Piazza!
The assembler takes files of the following form (this is halt.s, which is included in the start kit):
#Comments are great!
lui $r0, 0x33 #B033
ori $r0, $r0, 0x44 #8044
lui $r1, 0x33 #B133
ori $r1, $r1, 0x44 #8544
self: beq $r0, $r1, self #21FF
Anywhere a register is required, it must be either $r0, $r1, $r2, or $r3.
Commas are optional but the '$' is not.
'#' starts a comment.
The assembler can be invoked with the following command:
$ python assembler.py input.s [-o output.hex]
The output file is input.hex if not explicitly set - that is, the same name as the input file but with a .hex extension.
Use the -o option to change the output file name arbitrarily.
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission
Submission
You must submit the following files:
Regfile.circ
alu.circ
cpu.circ
multsimd.s
octal.s
We will be using our own versions of the *-harness.circ files, so you do not need to submit those.
In addition, you should not depend on any changes you make to those files.
You must also submit any .circ files that you use in your solution (they are not copied into your .circ file when you import them, only referenced).
Make sure you submit every .circ file that is part of your project!
You might want to test your cpu.circ file on the lab machines before you submit it, to make sure you got everything.
Submit in the usual way:
$ submit proj3
If you have done the extra credit:
$ submit proj3-ec
Grading
This project will be graded in large part by an autograder. Readers will
also glance at your circuits. If some of your tests fail the readers
will look to see if there is a simple wiring problem. If they can find
one they will give you the new score from the autograder minus a
deduction based on the severity of the wiring problem. For this reason
and as neatness is a small part of your grade please try to make your
circuits neat and readable.
Updates |
Overview |
Deliverables |
ISA |
Logisim |
Testing |
Submission