CS150 Fall '12 - Solutions to HW7

Albert Magyar

  1. Three-stage pipeline with register file & ALU in second stage:
    Figure
    Figure 1: Modification of Figure 7.54
    1. No flushing is required. The branch delay slot allows the `and' instruction at PC = 0x24 to execute normally; since the branch decision is made in cycle 2 when the `beq' is in the second pipeline stage, the new PC can be multiplexed in to the IMEM address input before the instruction after the `and' is fetched.
    2. If instructions at 0x24 and 0x28 were also executed immediately prior to the instruction at 0x20, the following sequence of instructions would be executed:
      and $t0, $s0, $s1
      or  $t1, $s4, $s0
      beq $t1, $t2, 40
      and $t0, $s0, $s1
      slt $t3, $s2, $s3
      
      
      The value for $t1 used by the `beq' would be stale, as the contents of $t1 in the register file would not be updated until the `or' writes back on the positive edge of the clock following the cycle when the `beq' is in the execute stage, where the branch decision is made. This conflict is illustrated in the next figure.
      Figure
      Figure 2: Unresolved RAW hazard
      In order to resolve this read-after-write hazard, forwarding multiplexers would need to be added to the outputs of the register file that would take the writeback data when such a RAW hazard occurs involving an instruction using a register written in the immediately preceding instruction.

      A fix for this hazard that uses forwarding to avoid stalling (in keeping with the naming of MIPS as a `Microprocessor without Interlocked Pipeline Stages') is shown in the next figure.
      Figure
      Figure 3: Forwarding to address RAW hazards
  2. Three-stage pipeline with register file in first stage & ALU in second stage:
    Figure
    Figure 4: Modification of Figure 7.54
    1. No flushing is required. Since the register file is simply moved earlier in the pipeline, the branch decision can still be calculated in the execute stage (or even in the decode stage), allowing the branch PC to arrive at the IMEM address input early enough.
    2. If instructions at 0x24 and 0x28 were also executed immediately prior to the instruction at 0x20, the following sequence of instructions would be executed:
      and $t0, $s0, $s1
      or  $t1, $s4, $s0
      beq $t1, $t2, 40
      and $t0, $s0, $s1
      slt $t3, $s2, $s3
      
      
      As before, the value of $t1 generated by the `or' is not ready when the value of $t1 is read from the register file by the `beq.' This conflict is actually changed; the value of $t1 from the `or' is now still written back on the positive edge of the clock following the cycle when the `beq' is in the execute stage; however, the `beq' actually reads $t1 in the decode stage, meaning it could even have an unresolved RAW hazard if an instruction that starts executing two cycles earlier modifies one of its operands. This conflict is illustrated in the next figure.
      Figure
      Figure 5: Unresolved RAW hazard
      This cannot be fixed by forwarding to the outputs of the regfile, as the `beq' leaves the decode stage (where the regfile fetch occurs) before the `or' is in the writeback stage. However, the forwarding multiplexers can be placed after the pipelining registers that hold the register addresses and data in the execute stage.
      Figure
      Figure 6: Forwarding to address RAW hazards
      This does not resolve all the potential RAW hazards; although the sequence of instructions given in the problem will now execute with no stalling, the potential still exists for an instruction reading a register two instructions after an instruction that modifies it, which will result in an unresolved RAW hazard in this design.

      Considering the instruction sequence below:
      add $t0, $s0, $s1
      add $t1, $s1, $s2
      add $t2, $t0, $t1
      
      
      The resolved and unresolved RAW hazards are shown in the pipeline diagram below.
      Figure
      Figure 7: Resolved & unresolved RAW hazards
  3. Since neither `J' nor `JAL' instructions require register reads to compute the target address, the choice of placement of the register file read does not affect either instruction. Since both are delayed, the target address can be computed in the execute stage and fed back into the address input of the IMEM when a `J' or `JAL' is in the execute stage.

    Also, for a JAL instruction to successfully write PC+8 into the regfile, and additional writeback path must be added. This must be combined with a multiplexer that selects 5′b31 as the writeback address on a JAL. These modifications are shown outlined in red in the figure below.
    Figure
    Figure 8: J & JAL



File translated from TEX by TTH, version 4.03.
On 14 Oct 2012, 10:58.