Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Pipelining and Vector Processing - Computer Architecture - Lecture Slides, Slides of Computer Architecture and Organization

Himgiri Zee University Computer Architecture and Organization

Pipelining, Parallel Processing, Vector Processing, Arithmetic Pipeline, Array Processors, RISC Pipeline, Instruction Pipeline are the topics professor discussed in class.

Typology: Slides

2011/2012

Uploaded on 11/03/2012

dharmaraaj 🇮🇳

4.4

(65)

153 documents

1 / 37

This page cannot be seen from the preview

Don't miss anything!

PIPELINING AND VECTOR PROCESSING

• Parallel Processing

• Pipelining

• Arithmetic Pipeline

• Instruction Pipeline

• RISC Pipeline

• Vector Processing

• Array Processors

Docsity.com

Partial preview of the text

Download Pipelining and Vector Processing - Computer Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

PIPELINING AND VECTOR PROCESSING • • • • Arithmetic PipelineParallel ProcessingPipeliningInstruction Pipeline

• • • Array ProcessorsRISC PipelineVector Processing Docsity.com

Levels of Parallel Processing - Job or Program levelPARALLEL PROCESSING

Execution of process to achieve faster- Task or Procedure level- Inter-Instruction level- Intra-Instruction level Concurrent Events Computational Speed in the computing

Parallel Processing

COMPUTER ARCHITECTURES FOR PARALLEL Von-Neuman based PROCESSING

Dataflow Reduction^ SISDMISD^ SIMDMIMD^ Superscalar processors^ Superpipelined processors^ VLIW^ Nonexistence^ Array processors^ Systolic arrays Associative processors Shared-memory multiprocessors Message-passing multicomputers Bus based Crossbar switch based Multistage IN based Hypercube Mesh Reconfigurable

Parallel Processing

Characteristics - Standard von Neumann machine - Instructions and data are stored in memory - One operation at a time^ ControlSISD COMPUTER SYSTEMS^ Unit^ Instruction stream^ ProcessorUnit^ Data stream Memory

Limitations Von Neumann bottleneck Maximum speed of the system is limited by the Memory Bandwidth - Limitation on - Memory is shared by CPU and I/O Memory Bandwidth (bits/sec or bytes/sec)

Parallel Processing

M M M Instruction stream • • • MISD COMPUTER SYSTEMSCUCUCU • • • PPP Memory Data stream

Characteristics - There is no computer at present that can beclassified as MISD

Parallel Processing

SIMD COMPUTER SYSTEMS P Alignment networkP Control Unit Memory • • • P

Characteristics - Only one copy of the program exists- A single controller executes one instruction at a time^ Data bus^ M^ M^ Data stream • • •^ Instruction streamM^ Processor units^ Memory modules

Parallel Processing

CharacteristicsMIMD COMPUTER SYSTEMS^ P^ M^ P^ Interconnection Network^ Shared MemoryM^ • • • P^ M

Types of MIMD computer systems^ - Multiple processing units- Execution of multiple instructions on multiple data - Shared memory multiprocessors - Message-passing multicomputers

Parallel Processing

CharacteristicsSHARED MEMORY MULTIPROCESSORS All processors have equally direct access toone large memory address space

Example systems Limitations Bus and cache-based systemsMultistage IN-based systemsCrossbar switch-based systems Memory access latencyHot spot problemP^ M^ - Sequent Balance, Encore Multimax- Ultracomputer, Butterfly, RP3, HEP- C.mmp, Alliant FX/8^ Interconnection Network(IN)MP^ • • •^ • • • MP^ Buses,Multistage IN,Crossbar Switch

Parallel Processing

PIPELINING

R1 R3 R5  AR1 * R2, R4R3 + R4i, R2  Bi  Ci AddLoad Ai and Bi Multiply and load Ci

A technique of decomposing a sequential processinto suboperations, with each subprocess beingexecuted in a partial dedicated segment thatoperates concurrently with all other segments. A R1 Ai i* B Multiplieri R3 + C i R2for i = 1, 2, 3, ... , 7R

Adder R5^ Memory Segment 1^ Bi^ Ci^ Pipelining Segment 2 Segment 3 Docsity.com

OPERATIONS IN EACH PIPELINE STAGE Number ClockPulse 123456789 Segment 1R1A2A3A4A5A6A7A1 R2B2B3B4B5B6B7B1 A7 * B7A1 * B1A2 * B2A3 * B3A4 * B4A5 * B5A6 * B6R3Segment 2 C1C2C3C4C5C6C7R4 A1 * B1 + C1A2 * B2 + C2A3 * B3 + C3A4 * B4 + C4A5 * B5 + C5Segment 3A6 * B6 + C6A7 * B7 + C7R5^ Pipelining

Conventional Machine (Non-Pipelined) Pipelined Machine (k stages)^ n: t t tt tnp 11 k: :::^ = n * t Number of tasks to be performedClock cycle (time to complete each suboperation)Time required to complete the n tasksClock cycleTime required to complete the n tasksn PIPELINE SPEEDUP

Speedup^ tSSlim nkkk^ : = (k + n - 1) * t= nt SpeedupSkn / (k + n - 1)t= t tnp p ( = k, if tp n = k * tp )

Pipelining

PIPELINE AND MULTIPLE FUNCTION UNITS

Multiple Functional Units^ IP^ i 1 I^ Pi+1 2 I^ P^ i+2 3 I^ P^ i+3 4

Example- 4-stage pipeline- subopertion in each stage; t- 100 tasks to be executed - 1 task in non-pipelined system; 204 = 80nS Pipelined SystemNon-Pipelined SystemSpeedupnktS (k + n - 1)tp = 100 * 80 = 8000nSp = (4 + 99) * 20 = 2060nSp = 20nS

4-Stage Pipeline is basically identical to the system with 4 identical function unitsk^ = 8000 / 2060 = 3.

Pipelining

4-STAGE FLOATING POINT ADDER r = max(p,q) subtractor p Exponent A = a x 2 pa Fraction adder t = |p - q| fraction Other Right shifterqB = b x 2 Fraction selector Fraction with min(p,q)qb

Stages: S1 S2 S3 S4 r r Exponent (^) adder C = A + B = c x 2 = d x 2 (r = max (p,q), 0.5 s Leading zero countercr (^)  d < 1) Left shifters cd d Arithmetic Pipeline

Six Phases* in an Instruction Cycle [1] Fetch an instruction from memory [2] Decode the instruction [3] Calculate the effective address of the operand [4] Fetch the operands from memory [5] Execute the operation [6] Store the result in the proper place * Some instructions skip some phases * Effective address calculation can be done in the part of the decoding phaseINSTRUCTION CYCLE

* Storage of the operation result into a register ==> 4-Stage Pipeline [1] FI: [2] DA: Decode the instruction and calculate [3] FO: Fetch the operand [4] EX: Execute the operation is done automatically in the execution phase Fetch an instruction from memory the effective address of the operand

Instruction Pipeline

Pipelining and Vector Processing - Computer Architecture - Lecture Slides, Slides of Computer Architecture and Organization

Related documents

Partial preview of the text

Download Pipelining and Vector Processing - Computer Architecture - Lecture Slides and more Slides Computer Architecture and Organization in PDF only on Docsity!

PIPELINING AND VECTOR PROCESSING • • • • Arithmetic PipelineParallel ProcessingPipeliningInstruction Pipeline

• • • Array ProcessorsRISC PipelineVector Processing Docsity.com

Levels of Parallel Processing - Job or Program levelPARALLEL PROCESSING

Execution of process to achieve faster- Task or Procedure level- Inter-Instruction level- Intra-Instruction level Concurrent Events Computational Speed in the computing

COMPUTER ARCHITECTURES FOR PARALLEL Von-Neuman based PROCESSING

Characteristics - Standard von Neumann machine - Instructions and data are stored in memory - One operation at a time^ ControlSISD COMPUTER SYSTEMS^ Unit^ Instruction stream^ ProcessorUnit^ Data stream Memory

Limitations Von Neumann bottleneck Maximum speed of the system is limited by the Memory Bandwidth - Limitation on - Memory is shared by CPU and I/O Memory Bandwidth (bits/sec or bytes/sec)

M M M Instruction stream • • • MISD COMPUTER SYSTEMSCUCUCU • • • PPP Memory Data stream

Characteristics - There is no computer at present that can beclassified as MISD

SIMD COMPUTER SYSTEMS P Alignment networkP Control Unit Memory • • • P

Characteristics - Only one copy of the program exists- A single controller executes one instruction at a time^ Data bus^ M^ M^ Data stream • • •^ Instruction streamM^ Processor units^ Memory modules

CharacteristicsMIMD COMPUTER SYSTEMS^ P^ M^ P^ Interconnection Network^ Shared MemoryM^ • • • P^ M

Types of MIMD computer systems^ - Multiple processing units- Execution of multiple instructions on multiple data - Shared memory multiprocessors - Message-passing multicomputers

CharacteristicsSHARED MEMORY MULTIPROCESSORS All processors have equally direct access toone large memory address space

PIPELINING

R1 R3 R5  AR1 * R2, R4R3 + R4i, R2  Bi  Ci AddLoad Ai and Bi Multiply and load Ci

A technique of decomposing a sequential processinto suboperations, with each subprocess beingexecuted in a partial dedicated segment thatoperates concurrently with all other segments. A R1 Ai i* B Multiplieri R3 + C i R2for i = 1, 2, 3, ... , 7R

OPERATIONS IN EACH PIPELINE STAGE Number ClockPulse 123456789 Segment 1R1A2A3A4A5A6A7A1 R2B2B3B4B5B6B7B1 A7 * B7A1 * B1A2 * B2A3 * B3A4 * B4A5 * B5A6 * B6R3Segment 2 C1C2C3C4C5C6C7R4 A1 * B1 + C1A2 * B2 + C2A3 * B3 + C3A4 * B4 + C4A5 * B5 + C5Segment 3A6 * B6 + C6A7 * B7 + C7R5^ Pipelining

Conventional Machine (Non-Pipelined) Pipelined Machine (k stages)^ n: t t tt tnp 11 k: :::^ = n * t Number of tasks to be performedClock cycle (time to complete each suboperation)Time required to complete the n tasksClock cycleTime required to complete the n tasksn PIPELINE SPEEDUP

Speedup^ tSSlim nkkk^ : = (k + n - 1) * t= nt SpeedupSkn / (k + n - 1)t= t tnp p ( = k, if tp n = k * tp )

PIPELINE AND MULTIPLE FUNCTION UNITS

Multiple Functional Units^ IP^ i 1 I^ Pi+1 2 I^ P^ i+2 3 I^ P^ i+3 4

Example- 4-stage pipeline- subopertion in each stage; t- 100 tasks to be executed - 1 task in non-pipelined system; 204 = 80nS Pipelined SystemNon-Pipelined SystemSpeedupnktS (k + n - 1)tp = 100 * 80 = 8000nSp = (4 + 99) * 20 = 2060nSp = 20nS

4-Stage Pipeline is basically identical to the system with 4 identical function unitsk^ = 8000 / 2060 = 3.

4-STAGE FLOATING POINT ADDER r = max(p,q) subtractor p Exponent A = a x 2 pa Fraction adder t = |p - q| fraction Other Right shifterqB = b x 2 Fraction selector Fraction with min(p,q)qb

* Storage of the operation result into a register ==> 4-Stage Pipeline [1] FI: [2] DA: Decode the instruction and calculate [3] FO: Fetch the operand [4] EX: Execute the operation is done automatically in the execution phase Fetch an instruction from memory the effective address of the operand