



























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
Some concept of Computer Systems Performance Evaluation are Measurement and Statistics, Performance Evaluation, Performance Metrics, Queueing Lingo, Software Performance Engineering. Main points of this lecture are: Workloads and Tools, Component, Rement Component, Abstraction, Workload, Abstracting Reality, Approximate, Stylized Workload, Compare Various, Build Benchmarks
Typology: Slides
1 / 35
This page cannot be seen from the preview
Don't miss anything!
The issue around workloads is this: You want to be able to represent the reality of some environment. To do this, you must build a model or representation of that environment. This is a workload.
Tools are simply the component pieces of a workload; they may represent the driving part of that workload, or they may form the measurement component.
Reality ===> Model or Abstraction ===> Workload
Reality is what is - it's essentially un-measurable because it's too complex, too remote, too secret, ....
A Model is the thinking that goes into abstracting reality.
A Workload is an attempt to approximate the model.
A Benchmark is a stylized workload, usually very portable, used to compare various systems.
Computation Benchmarks – they pretty much depend on the speed of the hardware and the efficiency of the compiler. Useful for hardware comparisons.
Sieve of Eratosthenes Determines prime numbers. Has a series of loops.
Whetstone A synthetic benchmark designed to measure the behavior of scientific programs.
Dhrystone Claims to represent system programming environments. Generally integer rather than floating point arithmetic.
SPEC Benchmarks developed by the Systems Performance Evaluation Cooperative. Widely used on UNIX systems where the code is extremely portable. Contains the kinds of activities commonly found in engineering and scientific environments (compiles, matrix inversions, etc.)
Application/System Benchmarks – they pretty much depend on the efficiency of the OS and application. Useful for software comparisons.
Webstone Measures the number of accesses that can be made remotely on a target module. Measures network message handling, web server behavior, file lookup.
WebBench Measures how many accesses to a webserver can be accomplished in a given time.
TPC A series of Transaction Processing Council benchmarks. They generally are database oriented. A typical “transaction” involves doing a query on several data items and then updating those items.
AIM A series of operating system actions (scheduling, page faults, disk writes, IPC, etc.) Each of the actions is relatively atomic and can be run in either standalone/separate mode or as a bundle of tests.
So: What is the relative importance of these characteristics in a typical Development Environment?
Example: List some benchmarks/tools that could be used by development groups How do they fit these characteristics?
This involves figuring out what behaviour to approximate and then what workload to produce in order to duplicate this behaviour. Of the many possible behaviors on a system, which one do we want to single out.
What are job parameters - or what behaviour do we focus on?
Each of these raw numbers involves means, distributions, etc. interpreted in several ways. For example, disk accesses can be represented as:
Example: In a “real” environment, there are 100 people entering data at any one time. The average person completes 20 fields a minute, but there is a typical variation of +/- 5 – some people type 15 fields/minute and some get as high as 25 fields/minute.
How would you represent the input from these 100 people?
Example: A Whetstone program is designed to use the machine instructions in typical FORTRAN, computationally intensive programs.
There are numerous ways to express a component of system behavior.
Example:
Suppose a large number of processes are using the CPU. We can say either of the
following:
There are 1000 process schedules in a second. The CPU is 55% busy, therefore each
of the processes requires 0.55 milliseconds of CPU each time it asks for processing. This averaging expressed in a more formal way is simply
There are 1000 process schedules in a second. The CPU is 55% busy. But there’s a
wide variation in the processor demand based on the kind of processes or just simply randomness (a particular process needs different amounts of CPU based on where it is in its transaction.) Then we’d like to be able to express the CPU required as a mean (as in a)) and also a standard deviation given by
=
n
i
X (^) ave n Xi 1
( 1 / )
−
−
=
n
i
Xi Xave n
s 1
2 2 ( ) 1
1
How easy is it to find a work load that is representative? Not very easy! Issues
include:
complex that they must be ignored.
Example:
Increasing the level of multiprogramming increases memory usage which increases
paging and CPU-per-process usage.
The parameters by which we model a workload should NOT depend on the type of
system, on its configuration or on its software.
Example:
For instance, suppose we partially characterize a workload based on the number of
paging requests made. Then increasing memory will cause fewer page faults that may or may not affect user visible performance.
Vendors suggest benchmarks that will be advantageous to their company - they
LOOK for system dependence.
There are ways of being system independent; in fact, that's what open systems are
all about.
Characterize logically rather than physically. If you define a test in terms of “lines of
C”, it’s much more portable than “lines of assembler”.
Natural work loads:
Samples of the production work load that the system processes at the time of the experiment. It's generally shorter than a real load.
Modeling in this case means choosing the times of data collection.
We need both:
An accurate characterization - we know what parameters to use in describing our load, and what values they should be.
An accurate implementation - we can find a workload which matches our characterization.
Pros and cons of natural work loads:
They may be very representative, especially if the natural load is relatively stable.
System independence is low.
Not very controllable (only times and durations can be determined.) This means poor flexibility and reproducibility.
Cost to produce is relatively low.
Usage cost is high because they aren't compact - having a long run time and a great deal of data.
Artificial Work Loads:
Programs that aren’t derived from the production load.
We can describe these workloads in terms of the level of parameterization; we can build models to match a real load at any of these levels:
Pros and cons of artificial work loads:
Example: Characterize workloads with which you are familiar in terms of level of parameterization, and most exact/least exact.
Example: Pat is designing a communications server that receives requests from "higher level" routines. The requests are collected by a Request Handler that does nothing but put them into buffers. The Request Processor removes these requests from the buffers on a first come first serve basis.
requests -> Request Handler -> Buffers[n] -> Request Processor ->
This product will be used in a wide range of applications; the "higher level" routines typically send packets of 1348 bytes, but other sizes are also possible. In addition, the applications will be placing variable load on the system; loads might range from "very light" to "extremely heavy". Pat wishes to describe a benchmark (or tool) that can be used to test this product. (The specification of this benchmark is necessary since the Functional Spec requires a description of how the product will perform.)