0% found this document useful (0 votes)
285 views

Topic 2 - Syntax and Semantics Lecture Notes

The document discusses syntax and semantics in programming languages. It defines syntax as the structure of a program and semantics as the meaning of a program. Syntax is specified using formal grammars consisting of terminals, non-terminals, productions, and a start symbol. Grammars can be expressed using Backus-Naur Form and parsed using derivations and parse trees. Ambiguity occurs when a sentence has multiple parse trees and operator precedence/associativity resolve ambiguity in expressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
285 views

Topic 2 - Syntax and Semantics Lecture Notes

The document discusses syntax and semantics in programming languages. It defines syntax as the structure of a program and semantics as the meaning of a program. Syntax is specified using formal grammars consisting of terminals, non-terminals, productions, and a start symbol. Grammars can be expressed using Backus-Naur Form and parsed using derivations and parse trees. Ambiguity occurs when a sentence has multiple parse trees and operator precedence/associativity resolve ambiguity in expressions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

[2] Syntax and

Semantics
IS 214 - Principles of Programming Languages - 1st Sem AY
2020-2021

Based on Asst. Prof. Reinald Adrian Pugoy’s lecture notes on Syntax and Semantics
Objectives
At the end of this chapter, you should be able to:
• Describe syntax by generating derivations given a certain grammar.
• Examine issues involving syntax.
• Describe semantics using some informal methods.
Syntax and Semantics
• Syntax
– “What its program looks like.”
– Refers to ways symbols may be combined to create well-formed sentences in the language.
• Semantics
– “What its program means.”
– Much more difficult to describe.
– Informal methods and suggestive examples are normally used.
Syntax
Specifying Syntax
• Language consists of a set of strings (syntactically correct programs) of characters from some
alphabet of symbols.
• Strings of a language are called sentences or
statements.
• Syntax rules specify which strings of characters from the language’s alphabet are in the language.
Specifying Syntax
• Grammar
– Formal definition of the syntax of the language.
– It naturally defines the hierarchical structure of many programming languages.
Sample English Grammar
• subject / verb / object
– The girl / ran / home.
– The boy / cooks / dinner.
• article / noun / verb / object
• auxiliary verb / subject / predicate
– Did / the girl / run home?
– Is / the boy / cooking dinner ?
Sample English Grammar
• Rules
– <sentence> := <declarative> | <interrogative>
– <declarative> := <subject><verb><object>
– <subject> := <article><noun>
– <interrogative>:=<auxillaryverb><subject><predicate>
Grammar
• Definition: A grammar < ∑, N, P, S> consists of
four parts:

– A finite set ∑ of terminal symbols of tokens.


– A finite set N of non-terminal symbols or syntactic categories.
– A finite set P of productions or rules that describe how each non-terminal is defined in terms
of terminal symbols and non-terminals.
– A distinguished non-terminal S, the start symbol, that specifies the category being defined.
Grammar
• Terminals
– the, boy, ran, ate, cake, a, an
• Non-Terminals
– <sentence>, <subject>, <predicate>, <verb>,
<article>, <noun>
• Start Symbol
– One of the non-terminals.
Grammar
• Rules or Productions
(A finite set of replacement rules)
– <sentence> ::= <subject><predicate>
– <subject> ::= <article><noun>
– <predicate> ::= <verb><article><noun>
– <verb> ::= ran | ate
– <article> ::= the | a | an
– <noun> ::= boy | girl | cake
Backus-Naur Form (BNF)
• A grammar used to express the rules/production.
• Originally developed for the syntactic definition of Algol-60.
• The BNF grammar is a set of rules or productions of the form:
left-side ::= right-side

A non-terminal A string of terminals and


(syntactic category) Read as produces non-terminals.
Backus-Naur Form (BNF)
• A terminal represents the atomic symbols in the language.
• A non-terminal represents other symbols as
defined to the right of the symbol “::=”.
• Other symbols
– “|” is interpreted as alternative.
– “{}” denotes possible repetition of the enclosed
symbols 0 or more times.
Backus-Naur Form (BNF)
• Eg: A ::= B | {C} | 9
– “A produces B.”
– “A produces a string of 0 or more C’s.”
– “A produces 9.”

DID YOU GET MY


METHOD? DON’T BE
SCARED. BWAHAHA.
Given a grammar, how do we determine
if a particular string or statement is a
member of the language?
Alternatively, how do we know if it is a
valid syntax?
Derivations
• Use derivations to determine whether a particular string of terminals is a member of the language
defined by the grammar. (In a PL, valid syntax)
• Derivation is a sequence of sentential forms starting from the start symbol.
– Either leftmost or rightmost derivation.
• Replace any non-terminal by a right hand side value using any rule.
• Symbol used: “=>”
Derivations
• Derivation for the sentence:
“the boy ate the cake”
<sentence> => <subject><predicate>
=> <article> <noun> <predicate>
=> the <noun> <predicate>
=> the boy <predicate>
=> the boy <verb> <article> <noun>
=> the boy ate <article> <noun>
=> the boy ate the <noun>
=> the boy ate the cake
Derivations
• Also from <sentence>, the statement “a cake ate the boy” can also be derived. What does it imply?
– Syntax does not imply correct semantics.
Derivations
• Show that 010 is a member of the following grammar:
– <B> ::= 0<B> | 1<B> | 0 | 1
Derivation / Parse Tree
• Graphically shows how the start symbol of a grammar derives a string in the language.
• Using a parse tree, show that 010 is a member of the grammar:
– <B> ::= 0<B> | 1<B> | 0 | 1
Properties of a Parse Tree
• The root is labeled by the start symbol.
• Each leaf is labeled by a token or terminal.
• Each interior node is labeled by a non- terminal.
• If A is the non-terminal labeling some interior node and x1, x2,…xn are the children of that node
from left to right, then A -> x1, x2,…xn is a production.
Derive valid productions or rules of the
parse tree below.

<B>

0 <B>

1 <B>

0
Example
• Consider this sample PL grammar:
– <expression> ::= <term> |
<expression><addoperator><term>
– <term> ::= <factor> | <term><multoperator><factor>
– <factor> ::= <identifier> | <literal> | (<expression>)
– <identifier> ::= a | b | c | … | z
– <literal> ::= 0 | 1 | 2 | … | 9
– <addoperator> ::= + | - | or
– <multoperator> ::= * | / | div | mod | and

• Generate a parse tree for the string a + b * c.


Leftmost & Rightmost Derivations
• Leftmost Derivation
– In each step, the leftmost non-terminal is replaced.
• Rightmost Derivation
– In each step, the rightmost non-terminal is replaced.
• Given this grammar:
– S ::= aAS | a, A ::= SbA | SS | ba
– Derive the string aabbaa.
What’s the implication if a certain
statement can be generated by 2
or more distinct LM/RM
derivations?
Different Derivations, Same Parse Tree
• Derivations may not be unique. Example:
– Grammar: S ::= SS | (S) | ( )
– Sentence: ( ( ) ) ( )
– Derivations:
• S => SS => (S)S => ( ( ) )S => ( ( ) )( )
• S => SS => S( ) => (S)( ) => ( ( ) )( )
• Different derivations but
same parse tree. It’s just ok.
What’s the implication if a certain
statement can be generated by 2
or more distinct parse trees?
Ambiguity
• A grammar that generates a sentence for which there are 2 or more distinct parse trees is said to be
ambiguous.
Ambiguity
Given this grammar:
<assign> ::= <id> = <expr>
<id> ::= A | B | C
<expr> ::= <expr> + <expr> | <expr> * <expr> | (<expr>) | <id>

Show the derivation for the sentence:


A=B+C*A
Ambiguity

Derivation for the sentence A = B + C * A


The Problem of Ambiguity
• Syntactic ambiguity of language structures is a problem since compilers often base the
semantics of those structures on their syntactic form.
• The compiler decides what code to generate for a statement by examining its parse tree.
• Meaning of the structure cannot be determined uniquely if there are >1 parse trees.
• How do we solve this then?
Operator
Precedence
Operator Precedence
• A grammar can describe a certain syntactic structure so that part of the
structure’s meaning can follow its parse tree.
• A grammar can be rewritten to separate addition and multiplication operators.
• Rewriting would require additional non-terminals and some new rules.
Recall...

Derivation for the sentence A = B + C * A


Example
Given this grammar:
<assign> ::= <id> = <expr>
<id> ::= A | B | C
<expr> ::= <expr> + <expr> | <expr> * <expr> | (<expr>) | <id>

Modify the grammar such that ambiguity is resolved.


Operator
Associativity
Operator Associativity
• Another interesting question is whether operator associativity is also
correctly described.
• Expressions with 2 or more adjacent occurrences of operators with equal
precedence have those occurrences in proper hierarchical order?
Left Associativity
<assign> ::= <id> = <expr>
<id> ::= A | B | C
<expr> ::= <expr> + <term> | <term>
<term> ::= <term> * <factor> | <factor>
<factor> ::= (<expr>) | <id>

Left Recursive
• A BNF rule has its left hand side (LHS) appear at the beginning of its RHS.
Right Associativity
<assign> ::= <id> = <expr>
<id> ::= A | B | C
<expr> ::= <expr> + <term> | <term>
<term> ::= <term> * <factor> | <factor>
<factor> ::= (<expr>) | <id>

Right Recursive
• A BNF rule has its left hand side (LHS) appear right at the end of its RHS.
Operator Associativity
• In addition and multiplication, it does not matter what kind of associativity to be used. Why?
• But in other operations, like exponentiation, the kind of associativity should be defined.
• In most PL’s, the exponentiation operator is right
associative.
Syntax Diagrams
Syntax Diagrams
• A graphical way used to represent BNF rules.
• For each grammar rule, an equivalent syntax diagram can be drawn.
• This was popularized in the design of Pascal.
• Symbols
– Rectangle nodes for non-terminals.
– Circles for terminals.
Syntax Diagrams

<expression>
<term>

<expression> <addoperator> <term>

Sample syntax diagram for


<expression>::=<term> | <expression><addoperator><term>
Syntax Diagrams
<factor> ::= <identifier> |
<literal> | (<expression>)
<addop> ::= + | - | or
<term> ::= <term><expression>
if-then-else
Grammar
if-then-else Grammar
• Sample grammar for one particular form of if- then-else statement:
<stmt> ::= <if_stmt>
<if_stmt> ::= if <logic_expr> then <stmt> |
if <logic_expr> then <stmt> else <stmt>

• This grammar is ambiguous. The ambiguity can be illustrated by this simple statement:
if <logic_expr> then if <logic_expr> then
<stmt> else <stmt>
if-then-else Grammar
• The general rule
– “Match each else with the closest previous unmatched then.”
if-then-else Grammar
• Take note of the following:
– Between a then and its matching else, a statement must be matched (i.e. an if
statement with an else).
– An unmatched statement is an else-less if.
– A matched statement is either an if-then-else statement containing no unmatched
statements or any other non-conditional statement.
if-then-else Grammar
• The problem? The given grammar treats all statements as if they were all matched.
if-then-else Grammar
• To reflect the different categories, grammar has to be rewritten:
<stmt> ::= <matched> | <unmatched>
<matched> ::= if <logic_expr> then <matched> else <matched> | any non-if statement
<unmatched> ::= if <logic_expr> then <stmt> |
if <logic_expr> then <matched> else <unmatched>

You might also like