Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Pipelining and Vector Processing - Computer Architecture - Lecture Slides, Slides of Computer Architecture and Organization

Pipelining, Parallel Processing, Vector Processing, Arithmetic Pipeline, Array Processors, RISC Pipeline, Instruction Pipeline are the topics professor discussed in class.

Typology: Slides

2011/2012

Uploaded on 11/03/2012

dharmaraaj
dharmaraaj 🇮🇳

4.4

(65)

153 documents

1 / 37

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
PIPELINING AND VECTOR PROCESSING
Parallel Processing
Pipelining
Arithmetic Pipeline
Instruction Pipeline
RISC Pipeline
Vector Processing
Array Processors
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25

Partial preview of the text

Download Pipelining and Vector Processing - Computer Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

PIPELINING AND VECTOR PROCESSING • • • • Arithmetic PipelineParallel ProcessingPipeliningInstruction Pipeline

• • • Array ProcessorsRISC PipelineVector Processing Docsity.com

Levels of Parallel Processing - Job or Program levelPARALLEL PROCESSING

Execution of process to achieve faster- Task or Procedure level- Inter-Instruction level- Intra-Instruction level Concurrent Events Computational Speed in the computing

Parallel Processing

COMPUTER ARCHITECTURES FOR PARALLEL Von-Neuman based PROCESSING

Dataflow Reduction^ SISDMISD^ SIMDMIMD^ Superscalar processors^ Superpipelined processors^ VLIW^ Nonexistence^ Array processors^ Systolic arrays Associative processors Shared-memory multiprocessors Message-passing multicomputers Bus based Crossbar switch based Multistage IN based Hypercube Mesh Reconfigurable

Parallel Processing

Characteristics - Standard von Neumann machine - Instructions and data are stored in memory - One operation at a time^ ControlSISD COMPUTER SYSTEMS^ Unit^ Instruction stream^ ProcessorUnit^ Data stream Memory

Limitations Von Neumann bottleneck Maximum speed of the system is limited by the Memory Bandwidth - Limitation on - Memory is shared by CPU and I/O Memory Bandwidth (bits/sec or bytes/sec)

Parallel Processing

M M M Instruction stream • • • MISD COMPUTER SYSTEMSCUCUCU • • • PPP Memory Data stream

Characteristics - There is no computer at present that can beclassified as MISD

Parallel Processing

SIMD COMPUTER SYSTEMS P Alignment networkP Control Unit Memory • • • P

Characteristics - Only one copy of the program exists- A single controller executes one instruction at a time^ Data bus^ M^ M^ Data stream • • •^ Instruction streamM^ Processor units^ Memory modules

Parallel Processing

CharacteristicsMIMD COMPUTER SYSTEMS^ P^ M^ P^ Interconnection Network^ Shared MemoryM^ • • • P^ M

Types of MIMD computer systems^ - Multiple processing units- Execution of multiple instructions on multiple data - Shared memory multiprocessors - Message-passing multicomputers

Parallel Processing

CharacteristicsSHARED MEMORY MULTIPROCESSORS All processors have equally direct access toone large memory address space

Example systems Limitations Bus and cache-based systemsMultistage IN-based systemsCrossbar switch-based systems Memory access latencyHot spot problemP^ M^ - Sequent Balance, Encore Multimax- Ultracomputer, Butterfly, RP3, HEP- C.mmp, Alliant FX/8^ Interconnection Network(IN)MP^ • • •^ • • • MP^ Buses,Multistage IN,Crossbar Switch

Parallel Processing

PIPELINING

R1 R3 R5  AR1 * R2, R4R3 + R4i, R2  Bi  Ci AddLoad Ai and Bi Multiply and load Ci

A technique of decomposing a sequential processinto suboperations, with each subprocess beingexecuted in a partial dedicated segment thatoperates concurrently with all other segments. A R1 Ai i* B Multiplieri R3 + C i R2for i = 1, 2, 3, ... , 7R

Adder R5^ Memory Segment 1^ Bi^ Ci^ Pipelining Segment 2 Segment 3 Docsity.com

OPERATIONS IN EACH PIPELINE STAGE Number ClockPulse 123456789 Segment 1R1A2A3A4A5A6A7A1 R2B2B3B4B5B6B7B1 A7 * B7A1 * B1A2 * B2A3 * B3A4 * B4A5 * B5A6 * B6R3Segment 2 C1C2C3C4C5C6C7R4 A1 * B1 + C1A2 * B2 + C2A3 * B3 + C3A4 * B4 + C4A5 * B5 + C5Segment 3A6 * B6 + C6A7 * B7 + C7R5^ Pipelining

Conventional Machine (Non-Pipelined) Pipelined Machine (k stages)^ n: t t tt tnp 11 k: :::^ = n * t Number of tasks to be performedClock cycle (time to complete each suboperation)Time required to complete the n tasksClock cycleTime required to complete the n tasksn PIPELINE SPEEDUP

Speedup^ tSSlim nkkk^ : = (k + n - 1) * t= nt SpeedupSkn / (k + n - 1)t= t tnp p ( = k, if tp n = k * tp )

Pipelining

PIPELINE AND MULTIPLE FUNCTION UNITS

Multiple Functional Units^ IP^ i 1 I^ Pi+1 2 I^ P^ i+2 3 I^ P^ i+3 4

Example- 4-stage pipeline- subopertion in each stage; t- 100 tasks to be executed - 1 task in non-pipelined system; 204 = 80nS Pipelined SystemNon-Pipelined SystemSpeedupnktS (k + n - 1)tp = 100 * 80 = 8000nSp = (4 + 99) * 20 = 2060nSp = 20nS

4-Stage Pipeline is basically identical to the system with 4 identical function unitsk^ = 8000 / 2060 = 3.

Pipelining

4-STAGE FLOATING POINT ADDER r = max(p,q) subtractor p Exponent A = a x 2 pa Fraction adder t = |p - q| fraction Other Right shifterqB = b x 2 Fraction selector Fraction with min(p,q)qb

Stages: S1 S2 S3 S4 r r Exponent (^) adder C = A + B = c x 2 = d x 2 (r = max (p,q), 0.5 s Leading zero countercr (^)  d < 1) Left shifters cd d Arithmetic Pipeline

Six Phases* in an Instruction Cycle [1] Fetch an instruction from memory [2] Decode the instruction [3] Calculate the effective address of the operand [4] Fetch the operands from memory [5] Execute the operation [6] Store the result in the proper place * Some instructions skip some phases * Effective address calculation can be done in the part of the decoding phaseINSTRUCTION CYCLE

* Storage of the operation result into a register ==> 4-Stage Pipeline [1] FI: [2] DA: Decode the instruction and calculate [3] FO: Fetch the operand [4] EX: Execute the operation is done automatically in the execution phase Fetch an instruction from memory the effective address of the operand

Instruction Pipeline