Compiler Design Unit 4 By Dr. Choudhary Ravi Singh
Compiler Design Unit 4 By Dr. Choudhary Ravi Singh
In computer science, a symbol table is a data structure used by a language translator such as a compiler or
interpreter, where each identifier in a program's source code is associated with information relating to its
declaration or appearance in the source, such as its type, scope level and sometimes its location.
“Symbol table is a data structure used by a compiler to keep information about names which are used in the
source program to identify the various program elements like variables, constants, procedures, keywords
etc.” Each entry in the symbol table is a pair of the form (name, information).
name information
x real
y int
When a name is encountered, the symbol table is searched to see whether that name has been seen
previously? If that is new, it is entered into the table. Information collected in symbol table is used during
several stages in the compilation process.
There are number of issues associated with the construction of the symbol table:
1. A symbol table manager should be able to enter the entries in the table and return the address of its
entries/pointer to that entry.
2. It also should be able to check whether a particular entry already exist in the table? If it exists, then
it should be able return the address of that entry.
3. The search time must be as fast as possible.
4. Symbol table manager should be able to delete arbitrary elements and group of elements from the
symbol table.
5. Symbol table should be able to grow as entries are added to it.
6. It must support duplicate entries. The scoping rules for such entries determine which of these entries
are active at a given time.
name information
n m
Here n characters are always used to store the name in the symbol table and m characters are used to
store information about names.
2. Variable length entry:
Length for entries (names) may be different.
name information
1 to n m
There are two main operations in construction of symbol table namely: inserting values and
accessing stored information.
Information Catching:
The first step in designing the symbol table is to decide what information needs to be recorded. Generally
we only need to record information identifiers. Most token in the program, such as keywords, operators,
special symbols have fixed meaning. So we need not record information about them. Identifiers, on the other
hand don’t have a fixed meaning. They can represent different constructs or have different semantics in
different files and even in the same file. In order to determine what their meaning is in the given context, we
must record information about these identifiers so we can determine its meaning elsewhere in the
compilation process.
There are two major activities in symbol table organization, inserting values and accessing stored
information.
Value insertion operation includes, creating a record for a new symbol, assigning values of fields of the
record and making adjustments to the already exist symbol table.
Value retrieval operation performs searching in the available symbol table to find a name, examining the
formal parameters of a procedures to match them with actual parameters and examining values in a found
record by comparing with actual values found in the code.
We use following data structures for organizing symbol table
1. Linear list
2. Hash table.
3. Search tree.
4. Self-organizing list.
Linear list
The simple way to implement a symbol table is as a linear list of records, where each record describe one
entity/name of the symbol table. We can implement linear list by a stack as shown in figure
n-1
.
top/available
pointer .
Information 2 3
Name 2 2
Information 1 1
Name 1 0
Search Tree
Search tree is a more efficient approach to a symbol table organization. In search tree we add two links left
and right in each record which points the records in the search tree.
Each node of the tree has following fields
1. Left // pointer to left record
2. Name //name of record
3. Info // information about a names
4. Right // pointer to right record
Whenever a name is to be added, first the name is search in the tree. If it does not exist then a record for a
new name is created and added at a proper position in the search tree. Search tree always satisfying the two
properties
Hash Table
Hash is a function that maps data of different lengths to data of fixed lengths.
Hash table is a table of k pointers, having numbers from 0 to k-1. To enter a name into the symbol table we
find out the hash value of the name by applying suitable hash function, which maps the name into an integer
of fixed length between 0 to k-1.
We use this hash value as an index in the hash table. We search the list of the symbol table records build on
this hash table index. If the name is not present in the list we create a record for the name and insert it at the
head of the list. To retrieve information about a name, first the hash value of the name is obtained by hash
function and after that the list constructed on the hash value is searched for getting information about the
name.
0 prev name1 info next prev name2 info next
1
2 prev name1 info next prev name2 info next
.
.
.
.
k-1
0 to k-1 index values
prev
name2
info
next prev name3 info NULL
It refers to the process of allocating memory at compile time before the associated program is executed. For
this purpose we use stack, array, heap etc.
Stack Allocation
Stack in a computing architecture are regions of memory where data is added or remove in last in first out
fashion. New entities can be added by push operation and existing entities can be removed by pop
operations.
Drawbacks of Stack memory allocation
1. Memory is allocated before execution of the program begins.
2. No memory allocation or de-allocation action can be performed during execution of a program.
The activation record is a block of memory used for managing information needed by a single execution of a
procedure. An activation record is allocated when a procedure is entered and it is de-allocated when the
procedure is excited.
The information needed for each invocation of a procedure is kept in a runtime data structure called an
activation record (AR) or frame. The frames are kept in a stack called the control stack.
Note: this is memory used by the compiled program, not by the compiler. The compiler's job is to generate
code that obtains the needed memory.
Temporary Values: temporary variables are needed during the evaluation of expressions. Such variables are
stored in the field of temporaries.
Local Variable: local data is a data that is local to the execution of the procedure. The field of local data
holds the local data.
Saved Machine Status: this field holds the information regarding the status of machine just before the
procedure is called. This field contains the machine registers and program counters.
Access Link: this field is optional. Access link referring to non- local data that is held in other activation
record. This is also called static link field.
Control Link: this field is also optional. It points to the activation record of the caller. This is also called
dynamic link.
Return Value: this filed is used to store the result of the function call.
Actual Parameters: this field holds the information about the actual parameters. These actual parameters are
used by the calling procedure to supply parameter to the called procedure.
Each table is list of names and their associated attributes, and the tables are organized into a stack.
Whenever a new block is entered, a new empty table is pushed onto the stack for holding the names that are
declared as local to this block. And when a declaration is compiled, the table on the stack is searched for a
name. If the name is not found, then the new name is inserted. When a reference to a name is translated,
each table is searched, starting from the top table on the stack, ensuring compliance with static scope rules.
For example, consider following program structure. The symbol table organization will be as shown in
Figure 1.
Another technique can be used to represent scope information in the symbol table. We store the nesting
depth of each procedure block in the symbol table and use the [procedure name, nesting depth] pair as the
key to accessing the information from the table. A nesting depth of a procedure is a number that is obtained
by starting with a value of one for the main and adding one to it every time we go from an enclosing to an
enclosed procedure. This number is basically a count of how many procedures are there in the referencing
environment of the procedure. For example, refer to the program code structure above. The symbol table
contents are shown in the following figure using a nested depth approach:
P - 2
x boolean 2
a boolean 2
q - 3
x real 3
y real 3
z real 3
1. Detection of errors.
2. Recovery from errors.
Compilation Error
Compilation error refers to a state when a compiler fails to compile a piece of computer program source
code, either due to errors in the code, or, more unusually, due to errors in the compiler itself. A compilation
error message often helps programmers debugging the source code for possible errors. One of the important
task, that a compiler require to perform is that of detection of errors and recover from them.
Classification Of Errors
Lexical Phase Errors: lexical errors occur due to spelling errors, illegal characters in the source code,
exceeding length of identifier or numeric constants.
Syntactic phase errors: Syntactic phase errors occur due to missing operators, unbalanced parenthesis,
errors in structure, missing keywords.
Semantic phase errors: Semantic errors occur due to undeclared variables, mismatch between actual
parameters and formal parameters, incompatible types of operands.
Runtime Errors: Runtime errors are also termed exceptions. A runtime error is a software or hardware
problem that prevents a program from working correctly. Runtime errors might cause you to lose
information in the file you're working on, cause errors in the file (corrupt the file) so you can't work with it,
or prevent you from using a feature. Runtime errors can occur if you are running two software programs that
aren't compatible, if your computer has memory problems, or if the computer has been infected with
malicious software.
So to create a correct program we must overcome lexical errors, syntax errors, semantic errors, runtime
errors. After detection of errors, the first thing that a compiler is supposed to do is reporting of errors by
producing suitable messages. An error message should contain number of properties:
Sources Of Errors
1. The design specifications for the program may be inconsistent or faulty.
2. The algorithms used to meet the design may be incorrect.
3. The programmer may introduce errors in implementing the algorithms, either by introducing logical
errors or coding errors.
4. Errors in compiler itself.
5. Transcription errors can occur when program is typed into a file.
Error Handler
It is software program that is responsible for following tasks:
1. Detection of errors.
2. Reporting of errors.
3. Recovery from errors.
4. While doing error handling, it should not slow the processing of correct program.
Error Handling=Detection + Reporting + Recovery
Error Recovery
Error recovery is a process of adjusting input stream so that parsing may resume after syntax error
reported. It has following three major works.
The lexical analyzer detects an error when it discovers that no prefix of input fits the specification of any
token class. The simplest possible error recovery is to skip erroneous characters until the lexical analyzer
can find another token. If any unwanted character occurs then delete that character to recover from error.
We can delete successive characters from the remaining input until the lexical analyzer can find a well
formed token. This kind of error recovery strategy is known as panic mode recovery.
Some other techniques like insertion of missing characters, replace a character by another, transpose two
adjacent character can also be used.
Once the syntax errors are detected and reported, parser must be able to recover from a syntactic error. The
basic steps of error recovering methods are:
Panic Made Recovery: This method determine the context at the point of error and discard token from
input stream until a matching token is found. The disadvantage of this recovery is that it skip a considerable
amount of the input without checking it for further errors.
Phase Recovery
In this approach a parser may perform local correction on the remaining input when an error is
encountered/occurred. The basic local correction to recover from error are
Error Production
This method gives the error message to the programmer when an error is occurred. Programmer takes
appropriate actions to remove the errors.
Global Correction
It is also known as minimum distance recovery. It adjusts input before point where error was detected. In a
given error situation possibility of recovery techniques might be exists. We should choose the one which
involve smaller number of insertions, deletion, replacement in the processing of an incorrect input string.
1. Intermediate errors
2. Delay errors
An intermediate error is one which can be detected at a compile time during the compilation of an erroneous
instruction.
Delay error is one which can be detected at run time or execution time. Example division by zero error.
In panic made recovery, we skip all the input symbols until a synchronizing token is found.
a. All the empty entries are marked as synch to indicate that the parser will skip all the input symbols
until a symbol in the follow set of the non terminal A(symbol on the top of stack) is found.
b. Then the parser will pop the non terminal A from the stack. The parsing continue from that state.
c. To handle un matched terminal symbols, the parser pops that unmatched terminal symbols from the
stack and it issues an error message saying that unmatched terminal is inserted.
Let us see the following example to understand the panic made recovery
T: Terminals
NT: Non Terminals
Consider input string w=aab
1. Each empty entry in the action table is marked with a specific error routine.
2. An error routine insert the symbols into the stack or the input, it can also delete the symbol from the
stack and the input, or it can perform both insertion and deletion of.
a. Missing operands.
b. Unbalanced parenthesis.
for the grammar
E→E+E
E→E*E
E→id
items for initial state are defined as
E→.E+E I0 state
E→.E*E
E→.id
goto(I0,E)
E→E.+E I1 state
E→E.*E
goto(I0,id)
E→id. I2 state
goto(I1,+)
E→E+.E I3 state
E→.E+E
E→.E*E
E→.id
goto(I1,*) goto(I4,E)
E→E*.E I4 state E→E*E. I6 state
E→.E+E E→E.+E
E→.E*E E→E.*E
E→.id goto(I4,id)=I2 state
goto(I3,E) goto(I5,+)=I3 state
E→E+E. I5 state goto(I5,*)=I4 state
E→E.+E goto(I6,+)=I3 state
E→E.*E goto(I6,*)=I4 state
goto(I3.id)
E→id. I2 state
action goto
state id + * $ E
I0 s2 e1 e1 e1 1
I1 s3 s4 accept
I2 r3 r3 r3
I3 s2 5
I4 s2 6
I5 s3/r1 s4/r1 r1
I6 s3/r2 s4/r2 r2
The shift reduce conflict can be resolved by giving higher precedence to * and using left associativity.
Now we define LR(0) parsing table using left associativity and higher precedence of * as.
action goto
state id + * $ E
I0 s2 e1 e1 e1 1
I1 s3 s4 accept
I2 r3 r3 r3
I3 s2 5
I4 s2 6
I5 r1 s4 r1
I6 r2 r2 r2
Now in the above parsing table we insert the error routines for the empty entries in the action table.
action goto
state id + * $ E
I0 s2 e1 e1 e1 1
I1 e2 s3 s4 accept
I2 e4 r3 r3 r3
I3 s2 e1 e1 e1 5
I4 s2 e1 e1 e1 6
I5 e3 r1 s4 r1
I6 e3 r2 r2 r2
Routine e1: is called from state I0, I3 and I4, which pushes an imaginary id on parsing stack and covers it with I2.
Routine e2: is called from state I1, which pushes + on the parsing stack and covers it with I3 state.
Routine e3: is called from state I5 and I6.
Now let us trace the behavior of parser on string w=id+*id
stack string action
$I0 id+*id$ s2
$I0idI2 +*id$ r3
$I0EI1 +*id$ s3
$I0EI1+I3 *id$ e1 Error, call routine e1, which push id as imaginary
$I0EI1+I3idI2 *id$ r3 input
$I0EI1+I3EI5 *id$ s4
$I0EI1+I3EI5*I4 id$ s2
$I0EI1+I3EI5*I4idI2 $ r3
$I0EI1+I3EI5*I4EI6 $ r2
$I0EI1+I3EI5 $ r1
$I0EI1 $ accept
This method of error recovery attempts to eliminate the phrase containing the syntactic error. The parser
determines that a string derivable from A contains an error. Part of that string has already been processed,
and the result of this processing is a sequence of states on top of the stack. The remainder of the string is still
in the input, and the parser attempts to skip over the remainder of this string by looking for a symbol on the
input that can legitimately follow A. By removing states from the stack, skipping over the input, and pushing
GOTO(s, A) on the stack, the parser pretends that if has found an instance of A and resumes normal parsing.
Algorithm for the above process is defined as:
1. Scan down the stack until a state s with a goto on a particular non terminal A is found.
2. Discard zero or more input symbols until a symbol ‘a’ belongs to follow(A) is found.
3. The parser stacks the state goto[s,A], and it resume the normal parsing.