CS150 Fall '12 - Solutions to HW8

Albert Magyar

  1. Modification of Figure 7.49 for three-stage pipeline with register file & ALU in second stage:
    Figure
    Figure 1: Modification of Figure 7.49
    Note: the instruction sequence below, which is considered in this problem, is actually impossible to assemble in MIPS150. This is because the assembler inserts no-ops after all branches.
    mfc0 $k0, Cause
    mfc0 $k1, Status
    ...
    jr   $k0
    mtc0 $k1, Status
    
    
    1. See figure.
    2. No forwarding is needed after the add $s0, $s2, $s3 during the mfc0 instructions, since the only operands of the mfc0 instructions are Cause and Status; the add only changes $s0.
    3. No forwarding is needed after the mtc0 $k1, Status during the and or or instructions, since the only operands of those instructions are $s0, $s1, and $s4; the mtc0 only changes Status, which is never used in application code.
    4. Regardless of what instructions are executed in the application code, there will never be any data hazards due interrupts if $k0 and $k1 are never used in the application instructions. This is assisted by the fact that writes into CP0 originating from mtc0 instructions complete on the positive edge of the clock when the mtc0 is in the Execute stage.
      Since the ISR begins with two instructions that depend on no architectural registers, all instructions up to and including the last instruction before the interrupt was handled will have written back before the first instruction of the ISR which could use an architectural register (the third instruction) accesses the regfile. Furthermore, since the ISR ends with two instructions that do not modify architectural registers $0 to $31, any instruction in the ISR modifying an architectural register will have written back before any instruction after the ISR fetches from the register file.
  2. Modification of Figure 7.51 for three-stage pipeline with register file & ALU in second stage:
    Figure
    Figure 2: Modification of Figure 7.51
    1. See figure.
    2. No forwarding is needed after the lw $s0, 40($0) during the mfc0 instructions, since the only operands of the mfc0 instructions are Cause and Status; the lw only changes $s0.
    3. No forwarding is needed after the mtc0 $k1, Status during the and or or instructions, since the only operands of those instructions are $s0, $s1, and $s4; the mtc0 only changes Status, which is never used in application code.
    4. Regardless of what instructions are executed in the application code, there will never be any data hazards due interrupts if $k0 and $k1 are never used in the application instructions. This is assisted by the fact that writes into CP0 originating from mtc0 instructions complete on the positive edge of the clock when the mtc0 is in the Execute stage.
      Since the ISR begins with two instructions that depend on no architectural registers, all instructions up to and including the last instruction before the interrupt was handled will have written back before the first instruction of the ISR which could use an architectural register (the third instruction) accesses the regfile. Furthermore, since the ISR ends with two instructions that do not modify architectural registers $0 to $31, any instruction in the ISR modifying an architectural register will have written back before any instruction after the ISR fetches from the register file.
  3. Modification of Figure 7.54 for three-stage pipeline with register file & ALU in second stage:
    Figure
    Figure 3: Part (a): modification of Figure 7.54 - branch taken
    Figure
    Figure 4: Part (b): modification of Figure 7.54 - branch not taken
    1. See figure. There are many ways to do this entire problem - answers involving only a one-cycle delay on handling an interrupt may still be correct.
    2. There are multiple ways to handle an interrupt raised while a branch is in the pipeline. The implementation depicted in the figures does not assert InterruptHandled if there is a branch in the Instruction Fetch or Execute pipeline stages; otherwise, the pipeline would need fairly complicated modifications. By delaying the handling of the interrupt up to two cycles in this case, it simplifies the problem to be identical to (1) and (2).

      Case A (branch taken): If there were no delay on the exception, what would be the value stored into the EPC? The branch decision is still unknown when the beq is in the Instruction Fetch stage; however, this would still be difficult if the decision were known then. Storing the branch target would be wrong, as the instruction in the branch delay slot would not end up being executed. However, storing the PC of the instruction in the branch delay slot would cause the branch to not end up being taken. Therefore, delaying the InteruptHandled is preferable.

      Case B (branch not taken): For this case, the fact that the branch decision has not been made is the only reason to prevent taking the interrupt immediately.

      A more complicated strategy could be employed to only avoid taking interrupts when there is an unresolved branch - i.e. one that is in the interrupt stage. This would cause the first mfc0 instruction of the ISR to be executed immediately after the and in the branch delay slot, and require EPC to be forwarded from the result of the branch in the execute stage.
    3. No forwarding is needed after the mtc0 $k1, Status during the sub instruction (if the branch isn't taken), since the only operands of that instruction are $s0 and $s5; the mtc0 only changes Status, which is never used in application code.
    4. In addition to the lack of data hazards on architectural registers due to interrupts, there will never be a data hazard on the value of EPC if we choose to delay handling interrupts until there are no branches in the Instruction Fetch or Execute pipeline stages (or delay it one instruction and properly forward EPC from a branch in the execute stage).
  4. See figure for block diagram.
    Figure
    Figure 5: Modification of Figure 7.63 - three cycle processor
  5. See figures for block diagrams.
    Figure
    Figure 6: Modification of Figure 7.63 - three cycle processor
    Figure
    Figure 7: Logic to generate UART_IRQ - expansion of 'UART_IRQ' block in CPU block diagram



File translated from TEX by TTH, version 4.03.
On 02 Dec 2012, 07:10.