Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

UNIT 4 COMPILER DESIGN, Study notes of Compiler Design

DETAILED STUDY NOTES OF UNIT 4 OF COMPILER DESIGN

Typology: Study notes

2021/2022

Uploaded on 02/20/2023

riya-parnami
riya-parnami 🇮🇳

4

(1)

5 documents

1 / 20

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dr. Neeraj Dahiya, Asst. Prof., CSE Page 1
UNIT 4 SEMANTIC ANALYSIS TRANSLATION & RUNTIME STORAGE
9
Syntax-directed translation: Syntax-directed definitions - S-attributed definition - L-attributed
definition - Top-down and bottom-up translation - Type checking - Type systems -
Specification of a type checker; Run time environment -Source language issues -Storage
organization Storage allocation strategies - Access to non- local names - Parameter
passing - Symbol tables- Design aspects of Syntax Directed Translation.
SEMANTIC ANALYSIS
Semantic Analysis computes additional information related to the meaning of the
program once the syntactic structure is known.
In typed languages as C, semantic analysis involves adding information to the symbol
table and performing type checking.
The information to be computed is beyond the capabilities of standard parsing
techniques, therefore it is not regarded as syntax.
As for Lexical and Syntax analysis, also for Semantic Analysis we need both a
Representation Formalism and an Implementation Mechanism.
As representation formalism this lecture illustrates what are called Syntax Directed
Translations.
SYNTAX DIRECTED TRANSLATION
The Principle of Syntax Directed Translation states that the meaning of an input
sentence is related to its syntactic structure, i.e., to its Parse-Tree.
By Syntax Directed Translations we indicate those formalisms for specifying
translations for programming language constructs guided by context-free grammars.
o We associate Attributes to the grammar symbols representing the language
constructs.
o Values for attributes are computed by Semantic Rules associated with
grammar productions.
Evaluation of Semantic Rules may:
o Generate Code;
o Insert information into the Symbol Table;
o Perform Semantic Check;
o Issue error messages;
o etc.
There are two notations for attaching semantic rules:
1. Syntax Directed Definitions. High-level specification hiding many implementation details
(also called Attribute Grammars).
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14

Partial preview of the text

Download UNIT 4 COMPILER DESIGN and more Study notes Compiler Design in PDF only on Docsity!

UNIT 4 SEMANTIC ANALYSIS – TRANSLATION & RUNTIME STORAGE 9

Syntax-directed translation: Syntax-directed definitions - S-attributed definition - L-attributed definition - Top-down and bottom-up translation - Type checking - Type systems - Specification of a type checker; Run time environment -Source language issues -Storage organization – Storage allocation strategies - Access to non- local names - Parameter passing - Symbol tables- Design aspects of Syntax Directed Translation.

SEMANTIC ANALYSIS  (^) Semantic Analysis computes additional information related to the meaning of the program once the syntactic structure is known.  (^) In typed languages as C, semantic analysis involves adding information to the symbol table and performing type checking.  (^) The information to be computed is beyond the capabilities of standard parsing techniques, therefore it is not regarded as syntax.  (^) As for Lexical and Syntax analysis, also for Semantic Analysis we need both a Representation Formalism and an Implementation Mechanism.  (^) As representation formalism this lecture illustrates what are called Syntax Directed Translations.

SYNTAX DIRECTED TRANSLATION

The Principle of Syntax Directed Translation states that the meaning of an input sentence is related to its syntactic structure, i.e., to its Parse-Tree. By Syntax Directed Translations we indicate those formalisms for specifying translations for programming language constructs guided by context-free grammars. o We associate Attributes to the grammar symbols representing the language constructs. o Values for attributes are computed by Semantic Rules associated with grammar productions. Evaluation of Semantic Rules may: o Generate Code; o Insert information into the Symbol Table; o Perform Semantic Check; o Issue error messages; o etc. There are two notations for attaching semantic rules:

  1. Syntax Directed Definitions. High-level specification hiding many implementation details

(also called Attribute Grammars ).

  1. Translation Schemes. More implementation oriented: Indicate the order in which semantic rules are to be evaluated. Syntax Directed Definitions
  • Syntax Directed Definitions are a generalization of context-free grammars in which:
  1. Grammar symbols have an associated set of Attributes ;
  2. Productions are associated with Semantic Rules for computing the values of attributes. o (^) Such formalism generates Annotated Parse-Trees where each node of the tree is a record with a field for each attribute (e.g.,X.a indicates the attribute a of the grammar symbol X). o (^) The value of an attribute of a grammar symbol at a given parse-tree node is defined by a semantic rule associated with the production used at that node.

S-attributed SDT

 If an SDT uses only synthesized attributes, it is called as S-attributed SDT.

 These attributes are evaluated using S-attributed SDTs that have their semantic actions written after the production (right hand side).

 As depicted above, attributes in S-attributed SDTs are evaluated in bottom-up parsing, as the values of the parent nodes depend upon the values of the child nodes.

L-attributed SDT

 This form of SDT uses both synthesized and inherited attributes with restriction of not taking values from right siblings.

 In L-attributed SDTs, a non-terminal can get values from its parent, child, and sibling nodes. As in the following production.

Some examples of static checks:

Type checks

 A compiler should report an error if an operator is applied to an incompatible operand. Example: If an array variable and function variable are added together.

Flow-of-control checks  Statements that cause flow of control to leave a construct must have some place to which to transfer the flow of control. Example: An error occurs when an enclosing statement, such as break, does not exist in switch statement.

Position of type checker

token parser syntax Typechecker syntax intermediate intermediate

stream code generator

 A type checker verifies that the type of a construct matches that expected by its context. For example : arithmetic operator mod in Pascal requires integer operands, so a type checker verifies that the operands of mod have type integer.

 Type information gathered by a type checker may be needed when code is generated.

TYPE SYSTEMS

 The design of a type checker for a language is based on information about the syntactic constructs in the language, the notion of types, and the rules for assigning types to language

constructs.

 For example : “ if both operands of the arithmetic operators of +,- and * are of type integer, then the result is of type integer ”

Run-Time Environment:

 A program as a source code is merely a collection of text (code, statements etc.) and to make it alive, it requires actions to be performed on the target machine.

 A program needs memory resources to execute instructions. A program contains names for procedures, identifiers etc., that require mapping with the actual memory location at runtime.

 By runtime, we mean a program in execution. Runtime environment is a state of the target machine, which may include software libraries, environment variables, etc., to provide services to the processes running in the system.

 Runtime support system is a package, mostly generated with the executable program itself and facilitates the process communication between the process and the runtime environment.

 It takes care of memory allocation and de-allocation while the program is being executed.

Activation Trees

 A program is a sequence of instructions combined into a number of procedures. Instructions in a procedure are executed sequentially.

 A procedure has a start and an end delimiter and everything inside it is called the body of the procedure.

 The procedure identifier and the sequence of finite instructions inside it make up the body of the procedure.

 The execution of a procedure is called its activation. An activation record contains all the necessary information required to call a procedure.

 An activation record may contain the following units (depending upon the source language used).

Temporaries Stores temporary and intermediate values of an expression.

scanf(“%s”, username);

show_data(username);

printf(“Press any key to continue…”);

...

int show_data(char *user)

{ printf(“Your name is %s”, username); return 0 ; }

...

Below is the activation tree of the code given.

 Now we understand that procedures are executed in depth-first manner, thus stack allocation is the best suitable form of storage for procedure activations.

Type systems

type system a collection of rules for assigning type expressions to the various

-directed manner.

he

Static and Dynamic Checking of Types

Checkingdone by a compiler is said to be static, while checking done when the target program runs is termed dynamic. Any check can be done dynamically, if the target code carries the type of an element along with the value of that element.

Sound type system  A sound type system eliminates the need for dynamic checking for type errors because it allows us to determine statically that these errors cannot occur when the target program runs.  That is, if a sound type system assigns a type other than type_error to a program part, then type errors cannot occur when the target code for the program part is run.

Strongly typed language  A language is strongly typed if its compiler can guarantee that the programs it accepts will execute without type errors.

Error Recovery

 Since type checking has the potential for catching errors in program, it is desirable for type checker to recover from errors, so it can check the rest of the input.

 Error handling has to be designed into the type system right from the start; the type

checking rules must be prepared to cope with errors.

SPECIFICATION OF A SIMPLE TYPE CHECKER:

else type_error }

4. Sequence of statements: S → S1 ; S2 { S.type : = if S1.type = void and S1.type = void then void else type_error }

Type checking of functions

The rule for checking the type of a function application is : E → E1 ( E2) { E.type : = if E2.type = s and E1.type = s → t then t else type_error }

SOURCE LANGUAGE ISSUES

Procedures: A procedure definition is a declaration that associates an identifier with a statement. The identifier is the procedure name, and the statement is the procedure body. tk/ For example, the following is the definition of procedure named readarray :

procedure readarray;. Var i : integer; begin for i : = 1 to 9 do read(a[i]) end;

When a procedure name appears within an executable statement, the procedure is said to

be called at that point.

Storage Allocation

Runtime environment manages runtime memory requirements for the following entities:

Code : It is known as the text part of a program that does not change at runtime. Its memory requirements are known at the compile time.

Procedures : Their text part is static but they are called in a random manner. That is why, stack storage is used to manage procedure calls and activations.

Variables : Variables are known at the runtime only, unless they are global or constant. Heap memory allocation scheme is used for managing allocation and de-allocation of memory for variables in runtime.

Static Allocation

 In this allocation scheme, the compilation data is bound to a fixed location in the memory and it does not change when the program executes.

 As the memory requirement and storage locations are known in advance, runtime support package for memory allocation and de-allocation is not required.

Stack Allocation

 Procedure calls and their activations are managed by means of stack memory allocation.

 It works in last-in-first-out (LIFO) method and this allocation strategy is very useful for recursive procedure calls.

Heap Allocation

 Variables local to a procedure are allocated and de-allocated only at runtime.

 Heap allocation is used to dynamically allocate memory to the variables and claim it back when the variables are no more required.

 Except statically allocated memory area, both stack and heap memory can grow and shrink dynamically and unexpectedly.

 Therefore, they cannot be provided with a fixed amount of memory in the system.

Parameter Passing

 The communication medium among procedures is known as parameter passing. The values of the variables from a calling procedure are transferred to the called procedure by some mechanism.

 Before moving ahead, first go through some basic terminologies pertaining to the values in a program.

r-value

 The value of an expression is called its r-value. The value contained in a single variable also becomes an r-value if it appears on the right-hand side of the assignment operator.

 r-values can always be assigned to some other variable.

l-value

 The location of memory (address) where an expression is stored is known as the l-value of that expression.

 It always appears at the left hand side of an assignment operator.

For example:

day = 1 ;

week = day * 7 ;

month = 1 ;

year = month * 12 ;

 From this example, we understand that constant values like 1, 7, 12, and variables like day, week, month and year, all have r-values.

 Only variables have l-values as they also represent the memory location assigned to them.

For example:

7 = x + y;

is an l-value error, as the constant 7 does not represent any memory location.

Formal Parameters

 Variables that take the information passed by the caller procedure are called formal parameters.

 These variables are declared in the definition of the called function.

Actual Parameters

 Variables whose values or addresses are being passed to the called procedure are called actual parameters.

 These variables are specified in the function call as arguments.

Example:

fun_one()

{

int actual_parameter = 10 ; call fun_two(int actual_parameter);

}

fun_two(int formal_parameter)

{

print formal_parameter;

}

 Formal parameters hold the information of the actual parameter, depending upon the parameter passing technique used.

 It may be a value or an address.

Pass by Value

 In pass by value mechanism, the calling procedure passes the r-value of actual parameters and the compiler puts that into the called procedure’s activation record.

 Formal parameters then hold the values passed by the calling procedure.

y = 0 ; // y is now 0

}

 When this function ends, the l-value of formal parameter x is copied to the actual parameter y.

 Even if the value of y is changed before the procedure ends, the l-value of x is copied to the l-value of y making it behave like call by reference.

Pass by Name

 Languages like Algol provide a new kind of parameter passing mechanism that works like preprocessor in C language.

 In pass by name mechanism, the name of the procedure being called is replaced by its actual body.

 Pass-by-name textually substitutes the argument expressions in a procedure call for the corresponding parameters in the body of the procedure so that it can now work on actual parameters, much like pass-by-reference.

Symbol Table:

 Symbol table is an important data structure created and maintained by compilers in order to store information about the occurrence of various entities such as variable names, function names, objects, classes, interfaces, etc. Symbol table is used by both the analysis and the synthesis parts of a compiler.

 A symbol table may serve the following purposes depending upon the language in hand:

 To store the names of all entities in a structured form at one place.

 To verify if a variable has been declared.

 To implement type checking, by verifying assignments and expressions in the source code are semantically correct.

 To determine the scope of a name (scope resolution).

A symbol table is simply a table which can be either linear or a hash table. It maintains an entry for each name in the following format:

<symbol name, type, attribute>

For example, if a symbol table has to store information about the following variable declaration:

static int interest;

then it should store the entry such as:

<interest, int, static>

The attribute clause contains the entries related to the name.

Implementation

 If a compiler is to handle a small amount of data, then the symbol table can be implemented as an unordered list, which is easy to code, but it is only suitable for small tables only. A symbol table can be implemented in one of the following ways:

 Linear (sorted or unsorted) list  Binary Search Tree  Hash table  Among all, symbol tables are mostly implemented as hash tables, where the source code symbol itself is treated as a key for the hash function and the return value is the information about the symbol.

Operations

 A symbol table, either linear or hash, should provide the following operations.

insert()

 This operation is more frequently used by analysis phase, i.e., the first half of the compiler where tokens are identified and names are stored in the table.

 This operation is used to add information in the symbol table about unique names occurring in the source code. The format or structure in which the names are stored depends upon the compiler in hand.

 To determine the scope of a name, symbol tables are arranged in hierarchical structure as shown in the example below:

 The global symbol table contains names for one global variable (int value) and two procedure names, which should be available to all the child nodes shown above.

 The names mentioned in the pro_one symbol table (and all its child tables) are not available for pro_two symbols and its child tables.

 This symbol table data structure hierarchy is stored in the semantic analyzer and whenever a name needs to be searched in a symbol table, it is searched using the following algorithm:

 first a symbol will be searched in the current scope, i.e. current symbol table.

 if a name is found, then search is completed, else it will be searched in the parent symbol

table until,

 either the name is found or global symbol table has been searched for the name.