Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Computer Architecture Midterm Solutions: Instruction Set Architecture and Registers Sets, Exams of Computer Architecture and Organization

Solutions to the midterm exam questions related to computer architecture, specifically instruction set architecture and registers sets. It includes explanations and calculations for various instruction formats and their consequences, as well as examples and sequences of instructions to replace old formats.

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shashikanth_0p3
shashikanth_0p3 🇮🇳

4.8

(8)

55 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
University of California at Berkeley
College of Engineering
Computer Science Division { EECS
CS 152 D. Patterson & R. Yung
Fall 1995
Computer Architecture and Engineering
Midterm I Solutions
Question #1:
Technology and performance
[20 pts]
a)
Calculate the average execution time for each instruction with an innitely fast memory. Which is faster
and by what factor? Showyour work.
[6 pts]
InstructionTime GaAS =
CPI
Clock rate
=
2
:
5
1000
MHz
= 2.5 nanoseconds
InstructionTime CMOS =
CPI
Clock rate
=
0
:
75
200
MHz
= 3.75 nanoseconds
InstructionTime CMOS
InstructionTime GaAS
=
3
:
75
2
:
5
=1
:
5
So, the GaAS microprocessor is 1.5 times faster than a CMOS microprocessor.
Grading
: 4 points for showing your work. 1 p oint for having correct instruction execution times. 1 point
for having correct performance factor.
b)
How many seconds will each CPU take to execute a 1 billion instruction program?
[3 pts]
Execution Time of Program = (number of instructions)
(avg inst exec. time)
Execution Time of Program on GaAS = (1
10
9
)
(2
:
5
10
,
9) = 2.5 seconds
Execution Time of Program on CMOS = (1
10
9
)
(3
:
75
10
,
9) = 3.75 seconds
Grading
: 1 point for showing your work. 2 points for having correct program execution times.
c)
What is the cost of an untested GaAs die for this CPU? Repeat the calculation for a CMOS die. Show
your work.
[7 pts]
dies/wafer =
(
wafer diameter
=
2
)
2
die area
,
wafer diameter
p
2
die area
,
test dies per wafer
die yield = wafer yield
1+
defects per unit area
die area
,
cost of die =
cost of wafer
b
dies per wafer
die yield
c
dies/wafer GaAS =
j
(10
=
2)
2
1
,
10
p
2
1
,
4
k
=52
die yield GaAS = 0
:
8
,
1+
4
1
2
,
2
=0
:
088
pf3
pf4
pf5
pf8

Partial preview of the text

Download Computer Architecture Midterm Solutions: Instruction Set Architecture and Registers Sets and more Exams Computer Architecture and Organization in PDF only on Docsity!

University of California at Berkeley

College of Engineering

Computer Science Division { EECS

CS 152 D. Patterson & R. Yung Fall 1995

Computer Architecture and Engineering

Midterm I Solutions

Question #1: Technology and p erformance [20 pts]

a) Calculate the average execution time for each instruction with an in nitely fast memory. Which is faster and by what factor? Show your work. [6 pts]

InstructionTime GaAS = (^) CloCPI ck rate = (^10002) MHz:^5 = 2.5 nanoseconds

InstructionTime CMOS = (^) CloCPI ck rate = (^2000) MHz:^75 = 3.75 nanoseconds

InstructionTime CMOS InstructionTime GaAS =^

3 : 75 2 : 5 =^1 :^5

So, the GaAS micropro cessor is 1.5 times faster than a CMOS micropro cessor.

Gr ading : 4 p oints for showing your work. 1 p oint for having correct instruction execution times. 1 p oint for having correct p erformance factor.

b) How many seconds will each CPU take to execute a 1 billion instruction program? [3 pts]

Execution Time of Program = (numb er of instructions)  (avg inst exec. time)

Execution Time of Program on GaAS = (1  109 )  (2: 5  10 ^ 9) = 2.5 seconds

Execution Time of Program on CMOS = (1  109 )  (3: 75  10 ^ 9) = 3.75 seconds

Gr ading : 1 p oint for showing your work. 2 p oints for having correct program execution times.

c) What is the cost of an untested GaAs die for this CPU? Rep eat the calculation for a CMOS die. Show your work. [7 pts]

dies/wafer =

 (wafer diameter= 2 )^2

die area ^

  pwafer diameter

2  die area

test dies p er wafer

die yield = wafer yield 

1 + defects^ p^ er^ unit^ area^ ^ die^ area^

cost of die = (^) bdies pcost er w^ aferof^ waferdie yieldc

dies/wafer GaAS =

j

 (10=2)^2

1 ^

p  10

2  1 ^4

k

die yield GaAS = 0 : 8 

1 + 4  21

cost of die GaAS = (^) b 52 $2000 0 : 088 c = $

dies/wafer CMOS =

j

 (20=2)^2

2 ^

p^  20

2  2 ^4

k

die yield CMOS = 0 : 9 

1 + 1  22

cost of die CMOS = (^) b 121 $1000 0 : 225 c = $37: 04

Gr ading : 1 p oint for correctly using o or functions in formulas. 1 p oint deducted if formulas used are go o d, but computed values are wrong. Rest of the p oints on showing your work.

d) What is the ratio of the cost of the GaAs die to the cost of the CMOS die? [1 pt]

Cost of GaAS die Cost of CMOS die =^

$500: 00 $37: 04 =^13 :^5

Gr ading : 1 p oint for using an equation which sets up the ratio correctly.

e) Based on the costs and p erformance ratios of the CPU calculated ab ove, what is the ratio of cost/p erformance of the CMOS CPU to the GaAs CPU? [3 pts]

p erformance = (^) execution^1 time

cost CMOS p erf. CMOS cost GaAS p erf. GaAS

= costcost^ CMOS GaASexecexec^ timetime^ CMOSGaAS = $37 $500:^04 : 00 ^3 : 275 : 5 nsns = 0 : 111

Gr ading : 2 p oints for using the correct formula. 1 p oint for correct value.

Gr ading : -1 for non-reasonable consequences, i.e. it is just shorter. -2 for a decent attempt

d) Imagine we are translating machine co de to use the larger register set. Give an example of an instruction that used to t into the old format, but is imp ossible to translate directly into a single instruction in the new format. Write a short sequence of instructions that could replace it. [5 pts]

One simple example is the "sll" instruction. Old instruction that ts in 16-bit immediate eld: sll $4,$4, Now b ecomes: sll $4,$4, sll $4,$4, sll $4,$4,

Gr ading : -1 for mistakes of the form: addi 4 ;0, 32768 as an example for the old format. 16-bit values are signed for arithmetic op erations, therefore the range is from -32768 to +32767. -2 for a decent attempt. -4 for writing anything that resembles an example.

Question #3: Left Shift vs. Multiply [20 pts]

Design a 16-bit left shifter that shifts 0 to 15 bits using only 4:1 multiplexors.

a) How many levels of 4:1 multiplexors are needed? Show your work. [4 pts]

This question only makes sense if you read it as asking for the minimum numb er of multiplexors needed for the op eration. It can clearly b e done with arbitrarily more. The minimum numb er of levels is log 4 16 = 2

Grading: For an answer of 2 levels: 2 pts. For "showing your work," either by writing the formula or answering #3 correctly : 2pts. An answer of 3 levels, if supp orted by reasonable logic here or in #3 : 2pts.

b) If the delay p er multiplexor is 2ns, what is the sp eed of this shifter? (Assume zero delay for the wires.) [3 pts]

2 levels  2ns/level = 4ns.

Grading: 1 pt if you can multiply correctly. 2 pts for the correct answer.

c) Draw the four leftmost bits of the multiplexors with the prop er connections. (You might want to practice drawing it on the back of a page, and then transfer the nal version here.) [6 pts]

The simplest solution here was to draw each output O 15 ... O 12 separately, each using the shift amount bit pattern as the control, ie:

i0 ---|\ 0 ---|
i1 ---| | i0 ---| | i2 ---| |---. i1 ---| |---. i3 ---|/ | i2 ---|/ | | | i4 ---|\ | i3 ---|\ | i5 ---| | -|\ i4 ---| |-|
i6 ---| |--. | | i5 ---| |--. | | i7 ---|/ --| |-- o15 i6 ---|/--| |-- o ,---| | ,---| | i8 ---|\ | | | i7 ---|\ | | | i9 ---| | | ,-|/ i8 ---| | | ,-|/ i10 --| |-' | i9 ---| |-' | i11 --|/ | i10 --|/ | | | i12 --|\ | i11 --|\ | i13 --| | | i12 --| | | i14 --| |---' i13 --| |---' i15 --|/ i14 --|/

extra credit) In class, we've seen three versions of the multiply op eration (ignoring Bo oth enco ding). Approximately how many clo ck cycles do es a multiply take using the fastest algorithm? Approximately how much faster are multiplies that use the shifting technique (assuming the multiplier is a p ower of two)? [+ pts]

Original multiply takes from 1  32 = 32 to 2  32 = 64 clo ck cycles (just answering one or the other was accepted). Most common mistake here was forgetting that it is an algorithm in hardware, not a MIPS program doing the same thing. The shifting technique takes a variable amount of time, based on the lo cation of the 1 in the multiplier. Total clo ck cycles also dep ends on the numb er of instructions in the lo op. For the co de ab ove, numb er of clo ck cycles = 3 + 4  (size of multiplier). Assuming the multiplier is 2 i^ and i ranges form 0 to 15, and that i is randomly distributed (not necessarily true!), the minimum, average, and maximum time to execute is 3 + 4 = 7, 3 + 4  16 = 67, and 3 + 4  32 = 131 clo ck cycles. Thus, the shifting algorithm is faster when the size of the multipiler is less than or equal to 15 bits (32768) for the 64 clo ck original multiply, or less than or equal to 7 bits (128) in the 32 clo ck cycle original.

Grading: +1 pt for the original multiply cycle time. +1 pt for the shift metho d's cycle time. +2 pts for realizing the p erformance is variable based on the multiplier, and analyzing the p erformance in accordance. +3 pt max for an otherwise correct solution with a minor error.

Question #4: Enhancing the Single Cycle Datapath [30 pts]

Recall the 32-bit single-cycle control and datapath from class.

a) A MIPS instruction that would b e useful to have for writing lo ops is Decrement and Branch if Not Zero (DBNZ).

What changes and additions would b e needed to the single cycle data path to supp ort DBNZ? Mo dify the datapath b elow to show that supp ort. You do not need to write the control signals.

The only change to the datapath needed is to add a hardwired \1" to the ALU MUX. To execute the DBNZ instruction, the register le supplies the value of register Rd to the ALU on bus A, and the ALU MUX supplies \1" to the ALU. The ALU p erforms a subtraction, the output is written back into Rd, and the \Zero" output is used to select PC + 4 or PC+IMM as the next instruction to fetch. All of the other comp onents needed to execute DBNZ are already in the datapath.

This instruction is a go o d example of execution concurrency. Every other instructions we've studied uses either the ALU data output or the ALU \Zero" output in its execution, but never b oth. DBNZ uses b oth.

Gr ading : Most p eople lost p oints for adding unnecessary or redundant comp onents. Adding another ALU: -5 p oints. Adding unnecessary MUXes: -3 p oints. Unclear descriptions or annotations: -1 to -5 p oints.

b) The basic datapath supp orts only 32-bit loads. Imagine we wanted to augment the instruction set with new I-typ e load instructions Load from Lower to Upp er Halfword (LLUH) and Load from Upp er to Lower Halfword (LULH).

LLUH: Rd M [o set + base ] 15 :: 0 jj 016

LULH: Rd 016 jj M [o set + base] 31 :: 16

What datapath changes must b e made to supp ort this style of 16-bit loads? Mo dify the datapath repro duced b elow to show that supp ort.

There are two go o d solutions to this problem:

  1. Insert two MUXes in busW, one to select the correct value for the upp er halfword, the other to select the correct value for the lower halfword.
  2. Add a shifter that can shift the value on busW 16 bits left or right. O r; add two shifters, one to shift 16 bits left, the other to shift 16 bits right. A MUX selects whether the output of the left shifter, the right shifter, or the unshifted value gets written into the register le.

Gr ading : Most p eople either got this completely correct or made fundamental mistakes. Incorrect logic: - to -10 p oints. Overlo oking the fact that all for instructions other than LLUH and LULH the data won't b e shifted: -5 p oints. Unclear descriptions or annotations: -1 to -5 p oints.