Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

UNIT 5 COMPILER DESIGN, Study notes of Compiler Design

DETAILED STUDY NOTES OF UNIT 5 OF COMPILER DESIGN

Typology: Study notes

2021/2022

Uploaded on 02/20/2023

riya-parnami
riya-parnami šŸ‡®šŸ‡³

4

(1)

5 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dr. Neeraj Dahiya, Asst. Prof., CSE Page 1
UNIT 5 CODE GENERATION & OPTIMIZATION
9
Intermediate code generation: Intermediate languages - Declaration - Assignment Statement -
Boolean expression - Procedure calls - Code optimization : Introduction - Sources of
optimization - Introduction to data flow analysis- Code generator : Issues in the design of a
code generator- the target machine- A simple code generator- Design aspects of Code
Optimizer.
CODE GENERATION:
Code generation can be considered as the final phase of compilation. Through post code
generation, optimization process can be applied on the code, but that can be seen as a part of
code generation phase itself. The code generated by the compiler is an object code of some
lower-level programming language, for example, assembly language. We have seen that the
source code written in a higher-level language is transformed into a lower-level language that
results in a lower-level object code, which should have the following minimum properties:
ļ‚· It should carry the exact meaning of the source code.
ļ‚· It should be efficient in terms of CPU usage and memory management.
We will now see how the intermediate code is transformed into target object code (assembly
code, in this case).
Directed Acyclic Graph
Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see
the flow of values flowing among the basic blocks, and offers optimization too. DAG provides
easy transformation on basic blocks. DAG can be understood here:
ļ‚· Leaf nodes represent identifiers, names or constants.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download UNIT 5 COMPILER DESIGN and more Study notes Compiler Design in PDF only on Docsity!

UNIT 5 CODE GENERATION & OPTIMIZATION 9 Intermediate code generation: Intermediate languages - Declaration - Assignment Statement - Boolean expression - Procedure calls - Code optimization : Introduction - Sources of optimization - Introduction to data flow analysis- Code generator : Issues in the design of a code generator- the target machine- A simple code generator- Design aspects of Code Optimizer.

CODE GENERATION:

Code generation can be considered as the final phase of compilation. Through post code generation, optimization process can be applied on the code, but that can be seen as a part of code generation phase itself. The code generated by the compiler is an object code of some lower-level programming language, for example, assembly language. We have seen that the source code written in a higher-level language is transformed into a lower-level language that results in a lower-level object code, which should have the following minimum properties:

ļ‚· It should carry the exact meaning of the source code. ļ‚· It should be efficient in terms of CPU usage and memory management. We will now see how the intermediate code is transformed into target object code (assembly code, in this case).

Directed Acyclic Graph

Directed Acyclic Graph (DAG) is a tool that depicts the structure of basic blocks, helps to see the flow of values flowing among the basic blocks, and offers optimization too. DAG provides easy transformation on basic blocks. DAG can be understood here:

ļ‚· Leaf nodes represent identifiers, names or constants.

ļ‚· Interior nodes represent operators.

ļ‚· Interior nodes also represent the results of expressions or the identifiers/name where the values are to be stored or assigned.

Example:

t 0 = a + b

t 1 = t 0 + c

d = t 0 + t 1

[t 0 = a + b]

[t 1 = t 0 + c]

[d = t 0 + t 1 ]

Peephole Optimization

This optimization technique works locally on the source code to transform it into an optimized code. By locally, we mean a small portion of the code block at hand. These methods can be applied on intermediate codes as well as on target codes. A bunch of statements is analyzed and are checked for the following possible optimization:

Redundant instruction elimination

At source code level, the following can be done by the user:

int add_ten(int x)

{ int y, z;

int add_ten(int x) { int y;

int add_ten(int x) { int y = 10 ;

int add_ten(int x) { return x + 10 ;

MOV R1, R

GOTO L

L1 : GOTO L

L2 : INC R

In this code,label L1 can be removed as it passes the control to L2. So instead of jumping to L and then to L2, the control can directly reach L2, as shown below:

...

MOV R1, R

GOTO L

...

L2 : INC R

Algebraic expression simplification

There are occasions where algebraic expressions can be made simple. For example, the expression a = a + 0 can be replaced by a itself and the expression a = a + 1 can simply be replaced by INC a.

Strength reduction

There are operations that consume more time and space. Their ā€˜strength’ can be reduced by replacing them with other operations that consume less time and space, but produce the same result.

For example, x * 2 can be replaced by x << 1 , which involves only one left shift. Though the output of a * a and a^2 is same, a^2 is much more efficient to implement.

Accessing machine instructions

The target machine can deploy more sophisticated instructions, which can have the capability to perform specific operations much efficiently. If the target code can accommodate those instructions directly, that will not only improve the quality of code, but also yield more efficient results.

Issues of Code Generator

A code generator is expected to have an understanding of the target machine’s runtime environment and its instruction set. The code generator should take the following things into consideration to generate the code:

ļ‚· Target language : The code generator has to be aware of the nature of the target language for which the code is to be transformed. That language may facilitate some machine-specific instructions to help the compiler generate the code in a more convenient way. The target machine can have either CISC or RISC processor architecture.

ļ‚· IR Type : Intermediate representation has various forms. It can be in Abstract Syntax Tree (AST) structure, Reverse Polish Notation, or 3-address code.

ļ‚· Selection of instruction : The code generator takes Intermediate Representation as input and converts (maps) it into target machine’s instruction set. One representation can have many ways (instructions) to convert it, so it becomes the responsibility of the code generator to choose the appropriate instructions wisely.

ļ‚· Register allocation : A program has a number of values to be maintained during the execution. The target machine’s architecture may not allow all of the values to be kept in the CPU memory or registers. Code generator decides what values to keep in the registers. Also, it decides the registers to be used to keep these values.

ļ‚· Ordering of instructions : At last, the code generator decides the order in which the instruction will be executed. It creates schedules for instructions to execute them.

Descriptors

The code generator has to track both the registers (for availability) and addresses (location of values) while generating the code. For both of them, the following two descriptors are used:

ļ‚· Register descriptor : Register descriptor is used to inform the code generator about the availability of registers. Register descriptor keeps track of values stored in each register. Whenever a new register is required during code generation, this descriptor is consulted for register availability.

ļ‚· Address descriptor : Values of the names (identifiers) used in the program might be stored at different locations while in execution. Address descriptors are used to keep track of memory locations where the values of identifiers are stored. These locations

ļ‚· Determine the present location of z using the same method used in step 2 for y and generate the following instruction:

OP z’, L

where z’ represents the copied value of z.

ļ‚· Now L contains the value of y OP z, that is intended to be assigned to x. So, if L is a register, update its descriptor to indicate that it contains the value of x. Update the descriptor of x to indicate that it is stored at location L.

ļ‚· If y and z has no further use, they can be given back to the system.

Other code constructs like loops and conditional statements are transformed into assembly language in general assembly way.

CODE OPTIMIZATION:

Optimization is a program transformation technique, which tries to improve the code by making it consume less resources (i.e. CPU, Memory) and deliver high speed.

In optimization, high-level general programming constructs are replaced by very efficient low- level programming codes. A code optimizing process must follow the three rules given below:

ļ‚· The output code must not, in any way, change the meaning of the program.

ļ‚· Optimization should increase the speed of the program and if possible, the program should demand less number of resources.

ļ‚· Optimization should itself be fast and should not delay the overall compiling process.

Efforts for an optimized code can be made at various levels of compiling the process.

ļ‚· At the beginning, users can change/rearrange the code or use better algorithms to write the code.

ļ‚· After generating intermediate code, the compiler can modify the intermediate code by address calculations and improving loops.

ļ‚· While producing the target machine code, the compiler can make use of memory hierarchy and CPU registers.

Optimization can be categorized broadly into two types :

 machine independent and  machine dependent.

Machine-independent Optimization

In this optimization, the compiler takes in the intermediate code and transforms a part of the code that does not involve any CPU registers and/or absolute memory locations. For example:

do

{

item = 10 ; value = value + item;

} while(value< 100 );

This code involves repeated assignment of the identifier item, which if we put this way:

Item = 10 ;

do

{

value = value + item;

} while(value< 100 );

should not only save the CPU cycles, but can be used on any processor.

Machine-dependent Optimization

Machine-dependent optimization is done after the target code has been generated and when the code is transformed according to the target machine architecture. It involves CPU registers and

Basic blocks play an important role in identifying variables, which are being used more than once in a single basic block. If any variable is being used more than once, the register memory allocated to that variable need not be emptied unless the block finishes execution.

Control Flow Graph

Basic blocks in a program can be represented by means of control flow graphs. A control flow graph depicts how the program control is being passed among the blocks. It is a useful tool that helps in optimization by help locating any unwanted loops in the program.

Loop Optimization

Most programs run as a loop in the system. It becomes necessary to optimize the loops in order to save CPU cycles and memory. Loops can be optimized by the following techniques:

ļ‚· Invariant code : A fragment of code that resides in the loop and computes the same value at each iteration is called a loop-invariant code. This code can be moved out of the loop by saving it to be computed only once, rather than with each iteration.

ļ‚· Induction analysis : A variable is called an induction variable if its value is altered within the loop by a loop-invariant value.

ļ‚· Strength reduction : There are expressions that consume more CPU cycles, time, and memory. These expressions should be replaced with cheaper expressions without compromising the output of expression. For example, multiplication (x * 2) is expensive in terms of CPU cycles than (x << 1) and yields the same result.

Dead-code Elimination

Dead code is one or more than one code statements, which are:

Likewise, the picture above depicts that the conditional statement is always false, implying that the code, written in true case, will never be executed, hence it can be removed.

Partial Redundancy

Redundant expressions are computed more than once in parallel path, without any change in operands.whereas partial-redundant expressions are computed more than once in a path, without any change in operands. For example,

[partially redundant expression]

[redundant expression]

Loop-invariant code is partially redundant and can be eliminated by using a code-motion technique.

Another example of a partially redundant code can be:

If (condition)

{

a = y OP z;

}

else

{

...

}

c = y OP z;

We assume that the values of operands ( y and z ) are not changed from assignment of variable a to variable c. Here, if the condition statement is true, then y OP z is computed twice, otherwise once. Code motion can be used to eliminate this redundancy, as shown below:

If (condition)

{

... tmp = y OP z; a = tmp; ...

}

else

{

...