CMSC 415,  MSCS 521  Computer Architecture

 

Homework Assignment # 6

Due March 4, 2007

Read Patterson & Hennessy Chapter 5

 

1.      We wish to add the instruction lui (load upper immediate) to the multi-cycle datapath.  This instruction is described on page 95 of Chapter 2 in the text.  Add any necessary datapaths and control signals to the datapath of Figure 5.28 on page 323  and any necessary modification to the finite state machine shown in figure 5.37 on page 338.  You may find it helpful to examine the execution steps shown on pages 325 - 329 and consider the steps that will be needed to include this new instruction.  You may photocopy existing figures to make it easier to show your modifications.  Try to find the solution that minimizes the number of clock cycles for the new instruction.  Please state explicitly the number of cycles it will require to execute the new instruction on your modified datapath and finite state machine.

 

2.      This question is similar to number 1 except we want to add an instruction wai (where am I) to the datapath.  This instruction adds the instruction's location (the value of the PC when the instruction was fetched) into a register indicated by the instruction's rt field.  Assume that the datapath hasn't changed and that, as usual, the clock cycle is too short to allow an ALU operation and a register file access in a single clock cycle if one of them is dependent upon the other.

 

3.      This question is similar to number 1 except we want to add a new instruction jm (jump memory) to the multi-cycle datapath.  Its instruction format is similar to that of a load word except that the rt field is not used because the data loaded from memory is put into the PC instead of the target register.

 

4.      For this problem use the SPEC2000 Integer benchmark data in Figure 3.26 on page 228.  Assume there are three machines:

a)      M1: The multicycle datapath of Chapter 5 with a 500 MHz clock.

b)      M2: A machine like the multicycle datapath of Chapter 5, except the register updates are done in the same clock cycle as a memory read or ALU operation.  Thus in Figure 5.37 on page 338, states 6 and 7 and states 3 and 4 are combined.  This machine has a 400 MHz clock, since the register update increases the length of the critical path.

c)      M3: A machine like M2 except that effective address calculations are done in the same clock cycle as a memory access.  Thus states 2, 3, and 4 can be combined, as can 2 and 5, as well as 6 and 7.  This machine has a 250 MHz clock because of the long  cycle created by combining address calculation and memory access.

            Find out which machine is fastest.  Are there instruction mixes that would make

            another machine faster, and, if so, what are they?

 

1.      Show how the jump register instruction can be implemented simply by making changes to the finite state machine of Figure 5.37 on page 338.  (It may help you to remember that $0 = $zero = 0).

 

2.      Consider a change to the multi-cycle implementation that alters the register file so that it has only one read port.  Describe (via a diagram) any additional changes that will need to be made to the datapath in order to support this modification.  Modify the finite state machine to indicate how the instructions will work, given your new datapath.

 

3.      We wish to add the instruction rfe (return from exception) to the multi-cycle datapath.  A primary task of the rfe is to copy the contents of EPC to the PC.  (The exception mechanism requires several additional capabilities that we discuss in Chapter 7).  Add any necessary datapaths and control signals to the multi-cycle datapath of Figure 5.39 on page 344 and show any necessary modification to the finite state machine of figure 5.40 on page 345.

 

4.      Your friends at C3 (Creative Computer Corporation) have determined that the critical path that sets the clock cycle length of the multicycle datapath is memory access for loads and stores (not for instructions).  This has caused their newest implementation of the MIPS 3000 to run at a clock rate of 5.8 GHz rather than the target clock rate of 5.6 GHz.  However, Clara at C3 has a solution.  If all the cycles that access memory are broken into two clock cycles, then the machine can run at its target clock rate.  Using the SPEC CPUint 2000 mixes shown in Chapter 3 (Figure 3.26 on page 228), determine how much faster the machine with the two-cycle memory accesses is compared with the 4.8 GHz machine with single-cycle memory access.  Assume that all jumps and branches take the same number of cycles and that the set instructions and arithmetic-immediate instructions are implemented as R-type instructions.  Would you consider the further step of splitting instruction fetch into two cycles if it would raise the clock rate up to 6.4 GHz?  Why?