CS61C Fall 2017 Lab 7 - Pipelining and CPU Prep

Setup

Copy the starter lab files: cp -r ~cs61c/labs/07 .

Exercises

The lab this week is intended to be very short. It consists of a pipelining exercise which you will need to check off like normal. We want to give you as much time as possible to start project 2-2 when it is released and study for the midterm!

Exercise 1: Pipelining

Assume that on power-on, registers initially contain zeros.

Consider the following 2-input FSM. Its next state and output is computed by multiplying the inputs and adding it to the current state.

Say the propogation delay of a adder block is 50ns, the propogation delay of a multiplication block is 55 ns, and the clk-to-q delay of a register is 5ns. Calculate the maximum clock rate at which this circuit can operate. Assume that the register setup time is negligible, and that both inputs come from clocked registers that receive their data from an outside source.

Checkoff

  • Show your TA the calculations you performed to find the maximum clock rate (non-pipelined).

We want to improve the performance of this circuit, and let it operate at a higher clock rate. To do so, we will divide up the multiplication and addition into two different pipeline stages; in the first pipeline stage, we will perform the multiplication of the two inputs. In the second pipeline stage, we will add the product to the state.

Our definition of "correctness" will be simple: we will consider the sequence of outputs from this circuit "correct" iff it corresponds to the sequence of outputs the non-pipelined version would emit, potentially with some leading zeros. For example, if for some sequence of inputs the non-pipelined version emits [3,5,1,2,4, ...], a correct circuit might emit the sequence of outputs [0,3,5,1,2,4, ...] for that same sequence of inputs.

For your convenience and to help standardize check-offs, we are providing a starting point in the files pipeline.circ and ROMdata (use the links or pull from ~cs61c/labs/07/). In pipeline.circ, the sub-circuit Non-pipelined is set up exactly as the figure above. The main circuit is set up to produce the output sequence [3,5,1,2,4,-1,0,0,...] on the non-pipelined version of this circuit. It is also a handy example of how to use memory from a file. The ROM block should be initialized to the proper data, but if it is zero-ed out, right-click it and choose "Load image..." and select ROMdata.

Note that we need a register to hold the intermediate value of the computation between pipeline stages. This is a general theme with pipelines.

  1. Complete the sub-circuit Pipelined. You will need to add a register to divide the multiplication and addition into separate pipeline stages.
  2. Calculate the maximum clock rate for the pipelined version of the circuit.
  3. When we talked about pipelining in lecture, we discussed that if a computation depends on the output of a previous computation, it's difficult to pipeline them and we often need to insert a pipeline "bubble" (or several) to ensure that the output of the first computation is ready to be an input to the second. Explain why inserting such "bubbles" is unnecessary for this particular circuit.

Checkoff

Exercise 2: CPU Project Guide

We wanted to include this to guide you through part 2 of the CPU project when it is released. Please use this guide to your advantage for getting started with part 2.

We know that starting off part 2 with a blank slate might be intimidating, so we want to guide you through how to think about this project by implementing a simple R-type instruction, add. In addition, we provide a pre-proj exercise with questions/tricks to prepare you for the project. It's highly recommend to work through this guide and work on the pre-proj before you start part2. The pre-proj exercise is available on edx. You don't need this pre-proj to check-off lab7; it will be counted as project-checkoff credits.

Recall the five stages of the CPU pipeline:

  1. Instruction Fetch
  2. Instruction Decode
  3. Execute
  4. Memory
  5. Write Back

This guide will help you work through each of these stages, as it pertains to the add instruction. Each section will contain questions for you to think through, pointers to important details, and references to lecture material - but won't tell you exactly how to implement it. You may need to read and understand each question before going to the next one, and you can see the answers by clicking on the question. During your implementation, feel free to place things in subcircuits as you see fit.

Stage 1: Instruction Fetch

The main thing we are concerned about in this stage is: how do we get the current instruction? From lecture, we know that instructions are stored in the instruction memory, and each of these instructions can be accessed through an address.

  1. Which file in the project holds your instruction memory? How does it connect to your cpu.circ file?

    The instruction memory is the ROM module in run.circ. It provides an input into your CPU named "instruction" and takes an output named "fetch_addr".

  2. In your CPU, how would changing the address you output to fetch_address affect the instruction input? The instruction that run.circ outputs to your CPU should be the instruction at address fetch_address in instruction memory.
  3. How do you know what the fetch_address should be? (Hint: it is also known as PC) fetch_address is the address of the current instruction being executed, so it is saved in the PC register. For this project, it's fine for the PC to start at 0, and that is the default value for registers.
  4. For this project, does your PC hold an address of a byte or a word? If you look in run.circ, you will see that the address coming from your CPU will go through a splitter before entering the word-addressed memory module. This means that your PC should hold a byte address.
  5. For basic programs without any jumps or branches, how would the PC change from line to line? The PC must increment by 1 instruction in order to go to the next instruction, as the address held by the PC register represents what instruction to execute.
  6. We have provided the PC register in the cpu.circ file. Please implement the PC's behavior for simple programs - ignoring jumps and branches. You will have to add in the latter two in the project, but for now we are only concerned with being able to run strings of add instructions. Where should the output of the PC register go? Remember to connect the clock!

Stage 2: Instruction Decode

Now that we have our instruction coming from the instruction input, we have break it down in the Instruction Decode step, according to the RISCV Instruction Format you have learned.

  1. What type of instruction is add? What are the different bit fields and which bits are needed for each?
  2. In Logisim, what tool would you use to split out different groups of bits? Splitter!
  3. Please implement the instruction field decode stage using the instruction input. You should use tunnels to label and group the bits.
  4. Now we need to get the data from the corresponding registers, using the register file. Which instruction fields should be connected to the register file? Which inputs of the register file should it connect to? Instruction fields rs1 and rs2 will need to connect to read register 1 and 2.
  5. Please implement reading from the register file. You will have to bring in your register file from part 1 of the project. Remember to connect the clock!

Stage 3: Execute

The Execute stage, also known as the ALU stage, is where the computation of most instructions is performed. This is also where we will introduce the idea of using a Control Module.

  1. For the add instruction, what should be your inputs in to the ALU? Read Data 1 and 2 should go into ports A and B of the ALU.
  2. In the ALU, what is the purpose of the Switch, also called the ALU_ctr? It chooses which operation the ALU should perform.
  3. Although it is possible for now to just put a constant as the switch input, why would this be infeasible as you need to implement more instructions? With more instructions, the input to the ALU might need to change, so you would need to have some sort of circuit that changes the switch depending on the instruction being executed.
  4. Please create a new subcircuit for the Control Module. This module will need to take in as inputs the opcode and funct, using these to output a value for the ALU Switch, depending on what the current instruction is. There are a few ways of doing this. As you implement more instructions, this circuit will have to expand and become more complex.
  5. Please bring in your ALU from part 1 of the project and connect the ALU inputs correctly. Do you need to connect the clock? Why or why not?

Stage 4: Memory

The memory stage is where the memory can be written to using store instructions and read from using load instructions. Because the add instruction does not use memory, we will not spend too much time here.

  1. Please bring in the MEM module that we provided. At this point, we cannot connect most of the inputs, as we don't know where they should come from. However, you can still connect the clock.

Stage 5: Write back

The write back stage is where the results of the operation is saved back to the registers. Although not all instructions will write back to the register file (can you think of some which do not?), the add instruction does.

  1. Looking at the entire ISA, what are some of the instructions that will write back to a register? Where in the datapath would it get the values? add is an example that will take the output from the ALU and write it back. lhw will take the output from MEM and write it to a register. There are more not specified here.
  2. Let's create the write back phase so that it is able to write both ALU and MEM outputs to the Register File. Later, when you implement branching/jumping, you may need to add more to this mux. However, at the moment, we need to choose between the ALU and MEM outputs, as only one wire can end up being an input to the register file. Bring a wire from both the ALU and MEM, and connect it to a MUX.
  3. What should you use as the Select input to the MUX? What does the input depend on? This input should be able to choose between the two MUX inputs, ALU and MEM, which means that its value depends on which instruction is executing. This suggests that the input should originate from the Control Module, as the Control Module is responsible for figuring out which instruction is executing (using the opcode or funct).
  4. We now come to the second, and arguably more important, role of Control Modules - determining what values to output to the CPU, in order to control the path of execution. These values are called Control Signals.

    One example of this is the control signal that connects to the MUX from the previous step, which is commonly called WBSel. WBSel determines which value to write back to the register file.

    You can find more control signals on the lecture slides, and you'll need to define some yourself. If you find yourself needing a MUX, it's very likely that you'll need to define a control signal for it.

  5. There are a few ways of implementing the Control so that it can translate the opcode/functs to the corresponding instruction and then set the control signals correctly. One way to do so is outlined in lecture, using a ROM. The other method uses "AND" logic and "OR" logic to accomplish both tasks. If you are unfamiliar with it, review the slide now and try implementing it for the add instruction and the MemToReg control signal.
  6. Now that we have the inputs to the MUX sorted out, we need to wire the output. Where should the output connect to? Because the output is the data that you want to write into the Register File, it should connect to the Write Data input on the Register File.
  7. There are two more inputs on the Register File which are important for writing data: RegWEn and Write Register. One of these will come from the Instruction Decode stage and the other one will be a new control signal that you need to design. Please finish off the Write Back stage by these inputs on the Register File correctly.

If you have done all of the following steps correctly, you should have a processor that works for add instructions. For the rest of the project, you will be implementing more instructions in much of the same ways - connecting outputs to inputs, adding MUXes and other Logisim components, and defining new control signals. Hopefully, this will be an easier task now that you have a basic skeleton to work off of. Remember, the lecture slides have a lot of information already, and will help you to continue building on the circuits you have now. Good luck!