0% found this document useful (0 votes)
7 views15 pages

Compiler Design Unit 4 By Dr. Choudhary Ravi Singh

A symbol table is a crucial data structure used by compilers and interpreters to associate identifiers in source code with relevant information such as type and scope. It faces several challenges, including efficient entry management, fast search times, and support for scope rules, which can be organized through various structures like linear lists, hash tables, and search trees. Additionally, the document discusses memory allocation strategies, activation records, error detection and recovery processes, and the importance of clear error messaging for effective programming.

Uploaded by

ravi singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views15 pages

Compiler Design Unit 4 By Dr. Choudhary Ravi Singh

A symbol table is a crucial data structure used by compilers and interpreters to associate identifiers in source code with relevant information such as type and scope. It faces several challenges, including efficient entry management, fast search times, and support for scope rules, which can be organized through various structures like linear lists, hash tables, and search trees. Additionally, the document discusses memory allocation strategies, activation records, error detection and recovery processes, and the importance of clear error messaging for effective programming.

Uploaded by

ravi singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Symbol Table:

In computer science, a symbol table is a data structure used by a language translator such as a compiler or
interpreter, where each identifier in a program's source code is associated with information relating to its
declaration or appearance in the source, such as its type, scope level and sometimes its location.

“Symbol table is a data structure used by a compiler to keep information about names which are used in the
source program to identify the various program elements like variables, constants, procedures, keywords
etc.” Each entry in the symbol table is a pair of the form (name, information).

name information
x real
y int

When a name is encountered, the symbol table is searched to see whether that name has been seen
previously? If that is new, it is entered into the table. Information collected in symbol table is used during
several stages in the compilation process.

Issue in symbol table:

There are number of issues associated with the construction of the symbol table:

1. A symbol table manager should be able to enter the entries in the table and return the address of its
entries/pointer to that entry.
2. It also should be able to check whether a particular entry already exist in the table? If it exists, then
it should be able return the address of that entry.
3. The search time must be as fast as possible.
4. Symbol table manager should be able to delete arbitrary elements and group of elements from the
symbol table.
5. Symbol table should be able to grow as entries are added to it.
6. It must support duplicate entries. The scoping rules for such entries determine which of these entries
are active at a given time.

Symbol table entries (symbol table organization):

1. Fixed length entry:


Length for each entry (name) is fixed. FORTRAN allows 6 characters in a symbol, Pascal allows 8
character and language PL/I allow up to 31 characters in a name.

name information

n m
Here n characters are always used to store the name in the symbol table and m characters are used to
store information about names.
2. Variable length entry:
Length for entries (names) may be different.
name information

1 to n m
There are two main operations in construction of symbol table namely: inserting values and
accessing stored information.

Information Catching:

The first step in designing the symbol table is to decide what information needs to be recorded. Generally
we only need to record information identifiers. Most token in the program, such as keywords, operators,
special symbols have fixed meaning. So we need not record information about them. Identifiers, on the other
hand don’t have a fixed meaning. They can represent different constructs or have different semantics in
different files and even in the same file. In order to determine what their meaning is in the given context, we
must record information about these identifiers so we can determine its meaning elsewhere in the
compilation process.

We also need to store information about the following:


Constants, Variables. Types, Sub programs, Classes, Inheritance, Arrays, Records, Modules.

Symbol Table Organization:

There are two major activities in symbol table organization, inserting values and accessing stored
information.
Value insertion operation includes, creating a record for a new symbol, assigning values of fields of the
record and making adjustments to the already exist symbol table.
Value retrieval operation performs searching in the available symbol table to find a name, examining the
formal parameters of a procedures to match them with actual parameters and examining values in a found
record by comparing with actual values found in the code.
We use following data structures for organizing symbol table

1. Linear list
2. Hash table.
3. Search tree.
4. Self-organizing list.

Linear list
The simple way to implement a symbol table is as a linear list of records, where each record describe one
entity/name of the symbol table. We can implement linear list by a stack as shown in figure

n-1

.
top/available
pointer .

Information 2 3

Name 2 2

Information 1 1
Name 1 0

stack has o to n-1


index
The entity in the table are in their order of arrival. Whenever a new entity is to be added in the symbol table,
first the table is search sequentially to check whether the entity already exists in the symbol table or not. If
not then a record for new entity/name is created and added to the symbol table at the position given by the
available pointer. If entity already exits then a pointer to symbol table entity has been returned.
To retrieve information about a entity/name we search from the beginning of the stack up to the position
marked by available pointer. Available pointer indicates the beginning of the empty position of stack. When
a name is located the associated information can be found in slot following next.

Search Tree

Search tree is a more efficient approach to a symbol table organization. In search tree we add two links left
and right in each record which points the records in the search tree.
Each node of the tree has following fields
1. Left // pointer to left record
2. Name //name of record
3. Info // information about a names
4. Right // pointer to right record

left name info right

left name info right left name info right

Whenever a name is to be added, first the name is search in the tree. If it does not exist then a record for a
new name is created and added at a proper position in the search tree. Search tree always satisfying the two
properties

a. For name I and name J


If name J is left of name I then name J must be less then name I.
b. If name J is right of name I then name J must be greater then name I.

Hash Table
Hash is a function that maps data of different lengths to data of fixed lengths.
Hash table is a table of k pointers, having numbers from 0 to k-1. To enter a name into the symbol table we
find out the hash value of the name by applying suitable hash function, which maps the name into an integer
of fixed length between 0 to k-1.
We use this hash value as an index in the hash table. We search the list of the symbol table records build on
this hash table index. If the name is not present in the list we create a record for the name and insert it at the
head of the list. To retrieve information about a name, first the hash value of the name is obtained by hash
function and after that the list constructed on the hash value is searched for getting information about the
name.
0 prev name1 info next prev name2 info next
1
2 prev name1 info next prev name2 info next
.
.
.
.
k-1
0 to k-1 index values

Self Organizing List


In self organizing list, symbol table implementation is done using a link list as shown in figure below

head prev name1 info next

prev
name2
info
next prev name3 info NULL

Each node of the list has four fields


1. Previous pointer //pointer to previous node
2. Name // name entity
3. Information // information about entity
4. Next pointer //pointer to next node.
Next field for last name contains NULL pointer.
Whenever a name is to be added, first the name is search in the link list. If it does not exist then a node for a
new name is created and added at a proper position in the link list. To retrieve information about a name, a
search operation is performed on the list for getting information about the name.

Storage Allocation Schemes/ Storage Allocation Strategies For Symbol Table

Static memory Allocation

It refers to the process of allocating memory at compile time before the associated program is executed. For
this purpose we use stack, array, heap etc.

Stack Allocation
Stack in a computing architecture are regions of memory where data is added or remove in last in first out
fashion. New entities can be added by push operation and existing entities can be removed by pop
operations.
Drawbacks of Stack memory allocation
1. Memory is allocated before execution of the program begins.
2. No memory allocation or de-allocation action can be performed during execution of a program.

Heap Memory Allocation


Heap is a region of a computer memory, that is not manage automatically and they are not tightly managed
by CPU. It is more free floating region in the computer memory. To allocate memory for heap in C
programming we use calloc() and malloc() functions and to reallocate the allocated space we use realloc()
function, to de-allocate the allocated memory we use free() function.

Drawbacks of heap memory allocation


1) Allocated memory stays allocated until it is specifically de-allocated
2) Heap are large pool of memory they are not tightly managed by CPU.

Dynamic Memory Allocation


It refers to the process of allocating memory at run time/execution time. Dynamic memory allocation is
when an executing program request that the operating system give it a block of main memory. The program
then use this block for same purpose. For this purpose we use data structure that is called linked list.

Drawbacks of dynamic memory allocation


1. Execution of a program becomes slower due to presence of pointers.
2. Tough to implement due to pointer complexity.
3. Pointer Requires extra memory for storage.
Activation record:

The activation record is a block of memory used for managing information needed by a single execution of a
procedure. An activation record is allocated when a procedure is entered and it is de-allocated when the
procedure is excited.

The information needed for each invocation of a procedure is kept in a runtime data structure called an
activation record (AR) or frame. The frames are kept in a stack called the control stack.

Note: this is memory used by the compiled program, not by the compiler. The compiler's job is to generate
code that obtains the needed memory.

Various fields of an activation record are as follows:

Temporary Values: temporary variables are needed during the evaluation of expressions. Such variables are
stored in the field of temporaries.

Local Variable: local data is a data that is local to the execution of the procedure. The field of local data
holds the local data.

Saved Machine Status: this field holds the information regarding the status of machine just before the
procedure is called. This field contains the machine registers and program counters.

Access Link: this field is optional. Access link referring to non- local data that is held in other activation
record. This is also called static link field.

Control Link: this field is also optional. It points to the activation record of the caller. This is also called
dynamic link.

Return Value: this filed is used to store the result of the function call.

Actual Parameters: this field holds the information about the actual parameters. These actual parameters are
used by the calling procedure to supply parameter to the called procedure.

Representing Scope Information/Scoping:


The rules governing the scope of names in block structured language are:

1. A name declared within a block B is valid only within block B.


2. If block B1 is nested within B2 then any name valid for B2 is also valid for B1 unless the identifier for
that name is re-declared in B1.
These scope rules require a more complicated symbol table organization than simply a list of associations
between names and attributes. One technique that can be used is to keep multiple symbol tables, one for
each active block/procedure, such as the block that the compiler is currently in.

Each table is list of names and their associated attributes, and the tables are organized into a stack.
Whenever a new block is entered, a new empty table is pushed onto the stack for holding the names that are
declared as local to this block. And when a declaration is compiled, the table on the stack is searched for a
name. If the name is not found, then the new name is inserted. When a reference to a name is translated,
each table is searched, starting from the top table on the stack, ensuring compliance with static scope rules.
For example, consider following program structure. The symbol table organization will be as shown in
Figure 1.

Program main top main


Var x,y : integer :
x integer
Procedure P :
Var x,a : boolean; y integer
Procedure q
Var x,y,z : real;
Begin
. P
. x boolean
end
begin : a boolean
end
begin :
end
q
x real
y real
z real

Another technique can be used to represent scope information in the symbol table. We store the nesting
depth of each procedure block in the symbol table and use the [procedure name, nesting depth] pair as the
key to accessing the information from the table. A nesting depth of a procedure is a number that is obtained
by starting with a value of one for the main and adding one to it every time we go from an enclosing to an
enclosed procedure. This number is basically a count of how many procedures are there in the referencing
environment of the procedure. For example, refer to the program code structure above. The symbol table
contents are shown in the following figure using a nested depth approach:

procedure name information depth


main - 1
x integer 1
y integer 1

P - 2
x boolean 2
a boolean 2

q - 3
x real 3
y real 3
z real 3

Symbol table by using depth approach.


Two important features of a compiler are

1. Detection of errors.
2. Recovery from errors.

Compilation Error
Compilation error refers to a state when a compiler fails to compile a piece of computer program source
code, either due to errors in the code, or, more unusually, due to errors in the compiler itself. A compilation
error message often helps programmers debugging the source code for possible errors. One of the important
task, that a compiler require to perform is that of detection of errors and recover from them.

Classification Of Errors

1. Compilation time errors


2. Run time errors

Compilation time errors can be further classified as

a. Lexical phase errors


b. Syntactic phase errors
c. Semantic phase errors.

Lexical Phase Errors: lexical errors occur due to spelling errors, illegal characters in the source code,
exceeding length of identifier or numeric constants.

Syntactic phase errors: Syntactic phase errors occur due to missing operators, unbalanced parenthesis,
errors in structure, missing keywords.

Semantic phase errors: Semantic errors occur due to undeclared variables, mismatch between actual
parameters and formal parameters, incompatible types of operands.

Runtime Errors: Runtime errors are also termed exceptions. A runtime error is a software or hardware
problem that prevents a program from working correctly. Runtime errors might cause you to lose
information in the file you're working on, cause errors in the file (corrupt the file) so you can't work with it,
or prevent you from using a feature. Runtime errors can occur if you are running two software programs that
aren't compatible, if your computer has memory problems, or if the computer has been infected with
malicious software.

So to create a correct program we must overcome lexical errors, syntax errors, semantic errors, runtime
errors. After detection of errors, the first thing that a compiler is supposed to do is reporting of errors by
producing suitable messages. An error message should contain number of properties:

1. The error message should be easy to understand by the user.


2. The message should not be redundant.
3. The message should be specific and should localize the problem.

Sources Of Errors
1. The design specifications for the program may be inconsistent or faulty.
2. The algorithms used to meet the design may be incorrect.
3. The programmer may introduce errors in implementing the algorithms, either by introducing logical
errors or coding errors.
4. Errors in compiler itself.
5. Transcription errors can occur when program is typed into a file.
Error Handler
It is software program that is responsible for following tasks:
1. Detection of errors.
2. Reporting of errors.
3. Recovery from errors.
4. While doing error handling, it should not slow the processing of correct program.
Error Handling=Detection + Reporting + Recovery

Error Recovery

Error recovery is a process of adjusting input stream so that parsing may resume after syntax error
reported. It has following three major works.

1. Deletion of token types from input stream.


2. Insertion of token types.
3. Substitution of token types.

There are two class of recovery:

1. Local recovery: it adjusts input at a point where error was detected.


2. Global recovery: it adjusts input before point where error was detected.

Recovery of Lexical Phase Errors

The lexical analyzer detects an error when it discovers that no prefix of input fits the specification of any
token class. The simplest possible error recovery is to skip erroneous characters until the lexical analyzer
can find another token. If any unwanted character occurs then delete that character to recover from error.
We can delete successive characters from the remaining input until the lexical analyzer can find a well
formed token. This kind of error recovery strategy is known as panic mode recovery.

Some other techniques like insertion of missing characters, replace a character by another, transpose two
adjacent character can also be used.

Recovery of Syntactic errors:

Once the syntax errors are detected and reported, parser must be able to recover from a syntactic error. The
basic steps of error recovering methods are:

1. Suspend normal parsing on encountering an error.


2. Change the error configuration by changing the input buffer.
3. Resume normal parsing with the new configuration.

Various strategies used for recovery of syntactic errors are:


a. Panic mode recovery.
b. Phrase level recovery.
c. Error production.
d. Global correction.

Panic Made Recovery: This method determine the context at the point of error and discard token from
input stream until a matching token is found. The disadvantage of this recovery is that it skip a considerable
amount of the input without checking it for further errors.
Phase Recovery

In this approach a parser may perform local correction on the remaining input when an error is
encountered/occurred. The basic local correction to recover from error are

a. Deletion of source symbols. For example deletion of an extra semicolon.


b. Insertion of syntactic symbols. For example insert the missing semicolon.
c. Replacement of a source symbol by a syntactic symbol. For example replace A; by a;

Error Production
This method gives the error message to the programmer when an error is occurred. Programmer takes
appropriate actions to remove the errors.

Global Correction
It is also known as minimum distance recovery. It adjusts input before point where error was detected. In a
given error situation possibility of recovery techniques might be exists. We should choose the one which
involve smaller number of insertions, deletion, replacement in the processing of an incorrect input string.

Semantic Error Recovery


Semantic error can be detected both at a compile time or run time. We categorized semantic errors in two
categories

1. Intermediate errors
2. Delay errors

An intermediate error is one which can be detected at a compile time during the compilation of an erroneous
instruction.

Delay error is one which can be detected at run time or execution time. Example division by zero error.

Error Recovery in Predictive Parsing


An error may occur in the predictive parsing (LL(1) parsing) in the following cases:
1. If the terminal symbol on the top of stack does not match with the current input symbol.
2. If the top of stack is a non terminal A, and the current input symbol is a, and the parsing table entry
M[A, a] is empty.

The parser do the following in cases of error:


1. The parser should be able to give an error message.
2. It should be recover from that error case, and it should be able to continue the parsing with the rest of
input.

Panic made recovery with LL(1) Parsing

In panic made recovery, we skip all the input symbols until a synchronizing token is found.

a. All the empty entries are marked as synch to indicate that the parser will skip all the input symbols
until a symbol in the follow set of the non terminal A(symbol on the top of stack) is found.
b. Then the parser will pop the non terminal A from the stack. The parsing continue from that state.
c. To handle un matched terminal symbols, the parser pops that unmatched terminal symbols from the
stack and it issues an error message saying that unmatched terminal is inserted.

Let us see the following example to understand the panic made recovery

now predictive parsing table/LL(1) table is defined as


T/NT a b c d e $
S S→A S sync S→A S sync S→e S→^
A A→a sync A→ Ad sync sync sync

T: Terminals
NT: Non Terminals
Consider input string w=aab

stack input action


S$ aab$ S→A S
AbS$ aab$ A→a
abS$ aab$
error : missing b is inserted so we pop b from the stack
bS$ ab$
S$ ab$ S→A S
AbS$ ab$ A→a
abS$ ab$
bS$ b$
S$ $ S→^
$ $ accept

now we consider w=ceadb$

stack input action


S$ ceadb$ S→A S
AbS$ ceadb$ A→ Ad
cAdbS$ ceadb$
AdbS$ eadb$ error: remove until b or d occurred and pop A
dbS$ db$
bS$ b$
S$ $ S→^
$ $ accept
Error Recovery in LR Parsing

Phrase Level Recovery

1. Each empty entry in the action table is marked with a specific error routine.
2. An error routine insert the symbols into the stack or the input, it can also delete the symbol from the
stack and the input, or it can perform both insertion and deletion of.
a. Missing operands.
b. Unbalanced parenthesis.
for the grammar
E→E+E
E→E*E
E→id
items for initial state are defined as
E→.E+E I0 state
E→.E*E
E→.id
goto(I0,E)
E→E.+E I1 state
E→E.*E
goto(I0,id)
E→id. I2 state
goto(I1,+)
E→E+.E I3 state
E→.E+E
E→.E*E
E→.id
goto(I1,*) goto(I4,E)
E→E*.E I4 state E→E*E. I6 state
E→.E+E E→E.+E
E→.E*E E→E.*E
E→.id goto(I4,id)=I2 state
goto(I3,E) goto(I5,+)=I3 state
E→E+E. I5 state goto(I5,*)=I4 state
E→E.+E goto(I6,+)=I3 state
E→E.*E goto(I6,*)=I4 state
goto(I3.id)
E→id. I2 state

Now LR(0) parsing table is defined as

action goto
state id + * $ E
I0 s2 e1 e1 e1 1
I1 s3 s4 accept
I2 r3 r3 r3
I3 s2 5
I4 s2 6
I5 s3/r1 s4/r1 r1
I6 s3/r2 s4/r2 r2
The shift reduce conflict can be resolved by giving higher precedence to * and using left associativity.
Now we define LR(0) parsing table using left associativity and higher precedence of * as.
action goto
state id + * $ E
I0 s2 e1 e1 e1 1
I1 s3 s4 accept
I2 r3 r3 r3
I3 s2 5
I4 s2 6
I5 r1 s4 r1
I6 r2 r2 r2

Now in the above parsing table we insert the error routines for the empty entries in the action table.

action goto
state id + * $ E
I0 s2 e1 e1 e1 1
I1 e2 s3 s4 accept
I2 e4 r3 r3 r3
I3 s2 e1 e1 e1 5
I4 s2 e1 e1 e1 6
I5 e3 r1 s4 r1
I6 e3 r2 r2 r2

Routine e1: is called from state I0, I3 and I4, which pushes an imaginary id on parsing stack and covers it with I2.
Routine e2: is called from state I1, which pushes + on the parsing stack and covers it with I3 state.
Routine e3: is called from state I5 and I6.
Now let us trace the behavior of parser on string w=id+*id
stack string action
$I0 id+*id$ s2
$I0idI2 +*id$ r3
$I0EI1 +*id$ s3
$I0EI1+I3 *id$ e1 Error, call routine e1, which push id as imaginary
$I0EI1+I3idI2 *id$ r3 input
$I0EI1+I3EI5 *id$ s4
$I0EI1+I3EI5*I4 id$ s2
$I0EI1+I3EI5*I4idI2 $ r3
$I0EI1+I3EI5*I4EI6 $ r2
$I0EI1+I3EI5 $ r1
$I0EI1 $ accept

**Panic Made Error Recovery for LR Parsing

This method of error recovery attempts to eliminate the phrase containing the syntactic error. The parser
determines that a string derivable from A contains an error. Part of that string has already been processed,
and the result of this processing is a sequence of states on top of the stack. The remainder of the string is still
in the input, and the parser attempts to skip over the remainder of this string by looking for a symbol on the
input that can legitimately follow A. By removing states from the stack, skipping over the input, and pushing
GOTO(s, A) on the stack, the parser pretends that if has found an instance of A and resumes normal parsing.
Algorithm for the above process is defined as:
1. Scan down the stack until a state s with a goto on a particular non terminal A is found.
2. Discard zero or more input symbols until a symbol ‘a’ belongs to follow(A) is found.
3. The parser stacks the state goto[s,A], and it resume the normal parsing.

You might also like