0% found this document useful (0 votes)

35 views45 pages

Lecture 6. CFGs

The document discusses context-free grammars (CFGs) and their importance in representing natural language syntax and enabling automatic parsing. It provides a formal definition of a CFG and examples in Backus-Naur Form (BNF). The document then covers constructing a simple Vietnamese CFG, dealing with ambiguity, and converting grammars to Chomsky Normal Form (CNF) through a series of steps like eliminating epsilon rules and unit rules.

Uploaded by

Long Đặng Hoàng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views45 pages

Lecture 6. CFGs

Uploaded by

Long Đặng Hoàng

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Lecture 6 – Context

Free Grammars

Nguyen Phuong
Thai
Outline

• Context free grammars (CFGs)

• Building a simple Vietnamese CFG
• Ambiguation
• Chomsky normal form
• Syntactic relations/dependencies
• Special sentence structures
• Passive sentences
Why CFGs are important?

• Representing NL syntax
• Automatic parsing
• Limit representation for analyzing huge number of sentences
syntax
• Basic for other formalism such as HPSGs, LFGs, etc.
Formal Definition of CFG

• A grammar G is a quadruple (T , N, S, R), where

• T : a finite set of terminal symbols or tokens
• N: a finite set of nonterminal symbols (T ∩ N = ∅)
• S: a unique start symbol (S ∈ N)
• R: a finite set of rules or productions of the form (A, α)
• where:
• A is a nonterminal, and
• α is a string of zero or more terminals and nonterminals
• Note: zero means that α = ε is possible
Backus-Naur Form (BNF)
BNF (cont)

• Non-terminals: capital letters like A, B, and S

• The start symbol: S
• Strings drawn from (T∪N)*: lower case Greek letters like α, β,
and γ
• Strings of terminals: Lower case Roman letters like u, v, and w
Example

• Write a CFG for generating

A grammar derives sentences by

1. beginning with the start symbol, and
2. repeatedly replacing a nonterminal by the right-hand side of a
production with that nonterminal on the left-hand side, until
there are no more nonterminals to replace.
• Such a sequence of replacements is called a derivation of the sentence
being analysed
• The strings of terminals and nonterminals appearing in the various
derivation steps are called sentential forms
• A sentence is a sentential form with terminals only
• The language: the set of all sentences thus derived
The Structure of Parse Trees

• The start symbol is always at the root of the tree.

• Nonterminals are always interior nodes.
• Terminals are always leaves in the tree.
• The sentence being analysed is the the leaves read from left to
right.
Constructing a Simple Vietnamese CFG

• Noun phrase
• Verb phrase
• Sentence
• Adverbial
Noun Phrase

• Basic structure of a Vietnamese noun phrase

<pre-modifier> <head noun> <post-modifier>
• Example: “một mái tóc đẹp”
• Head noun: “mái”
• Premodifier: “một”
• Postmodifier: “tóc”, “đẹp”
Noun Phrase: Premodifier

• Structure:
<position -2> <position -1>
• Example: « tất cả những chiếc kẹo »
• <position -2>: “tất cả”
• <position -1>: “những”
• <position -2>: “tất cả”, “hết thảy”, v.v.
• <position -1>: number, determiner
•
Noun Phrase: Postmodifier

• Much more complex than premodifier

• Postmodifier: noun, adjective phrase, verb phrase, number,
pronoun (at the end), prepositional phrase, subordinate clause.
• Example 1: “cái máy tính của cơ quan”
• Example 2: “cái máy tính mà tôi mới mua hôm qua”
Question

• “quả bóng xanh”

• “bản đồ hàng lậu”
Answer

• (NP (Nc quả) (N bóng) (A xanh))

• (NP (N bản đồ) (NP (N hàng) (A lậu)))
Verb Phrase

• Basic structure of a Vietnamese verb phrase

<pre-modifier> <verb> <post-modifier>
• Example: “đang ăn cơm”
Verb Phrase: Premodifier

• Premodifier: adverb
• Adverb groups:
• đã, sẽ, đang, từng, còn, chưa, sắp, v.v.
• không (chẳng, chả), có, chưa
• cũng, vẫn, đều, lại, cứ, chỉ
• rất, hơi, khí, quá
• thường, hay, năng, ít, hiếm, v.v.
Verb Phrase: Postmodifier

• Postmodifier
• Complement
• Adjunct
• Complement
• Noun phrase: “đá bóng”
• Verb phrase: “cần viết thư”
• Prepositional phrase: “chuyển hàng xuống thuyền”
• Two noun phrases: “tặng bạn quyển sách”
• Noun phrase and prepositional phrase: “pha cà phê với sữa”
• Clause: “nói rằng cô ấy đẹp”
• …
Verb Phrase: Postmodifier

• Adjunct
• Adverbs: rồi, đã, đi, nào, …
• Adjective phrases: “chạy rất nhanh”
• Temporal noun phrases: “đá bóng hôm qua”
• Prepositional phrases: “đá bóng ở sân Mỹ Đình”
• or subordinate clauses: “không đi đá bóng được vì trời mưa”
Sentence

• Structure
<subject> <predicate>
• Subject
• Noun phrase
• Verb phrase: “dậy đúng giờ thật khó”
• Clause: “anh nói thế không đúng”
• Predicate
• Verb phrase
• Adjective phrase: “nhà anh ấy xa”
• Noun phrase: “em bé bảy tuổi”
Ambiguous Grammars

• A grammar is ambiguous if it permits

• more than one parse tree for a sentence,
or in other words,
• more than one leftmost derivation or more than one rightmost
derivation for a sentence.
Coping With Ambiguous Grammars

• Method 1: Rewrite the grammar to make it unambiguous.

• Method 2: Use disambiguating rules to throw away undesirable
parse trees, leaving only one tree for each sentence.
Example

S -> NP VP | NP AP P -> chúng tôi

• Analyze the following sentences

• Chúng tôi chuyển hàng xuống thuyền.
• Ông già đi nhanh quá.
• Căn-tin của trường mới mở cửa lại.
Chomsky Normal Form

• A useful form for dealing with context free grammars is the

Chomsky normal form.
• This is a particular form of writing a CFG which is useful for
understanding CFGs and for proving things about them.
• It also makes the parse tree for derivations using this form of the
CFG a binary tree.
CNF (cont)

• So what is Chomsky normal form?

• A CFG is in Chomsky normal form when every rule is of the form
A -> B C and A -> a
• where a is a terminal, and A , B , and C are variables
• further B and C are not the start variable.
• Additionally we permit the rule S -> ε where S is the start
variable, for technical reasons.
CNF (cont)

• The first step is simple! We just add a new start variable S0 and
the rule S0 -> S where S is the original start variable.
• By doing this we guarantee that the start variable doesn't occur
on the right hand side of a rule.
CNF (cont)

• Next we remove the ε rule. Suppose we are removing the rule A

-> ε.
• But now we have to “fix” the rules which have an A on their
right-hand side.
• for each occurrence of A on the right hand side, adding a rule (from the
same starting variable) which has the A removed.
• further if A is the only thing occurring on the right hand side, we replace
this A with ε. Of course this latter fact will have created a new ε rule.
• Repeat the above process over and over again until all ε rules
have been removed.
CNF (cont)

• For example, suppose our grammar contains the rule A -> ε and
the rule B -> uAv where u and v are not both the empty string.
• First we remove A -> ε. Then we add to this rule the rule B -> uv .
• Make sure that you don't delete the original rule B -> uAv .
• If, on the other hand we had the rules A -> ε and B -> A, then we
would remove the A -> ε and replace the rule B -> A with the
rule B -> ε. Of course we now have to eliminate this rule via the
same procedure.
CNF (cont)

• Next we need to remove the unit rules. If we have the rule A ->
B, then whenever the rule B -> u appears, we will add the rule A
-> u (unless this rule was already replaced.)
• Again we do this repeatedly until we eliminate all unit rules.
CNF (cont)

• At this point we have converted our CFG to one which

• has no ε transitions
• all rules are either of the form variables goes to terminal, or of the form
variable goes to string of variables and terminals with two or more
symbols.
CNF (cont)

• To convert the remaining rules to proper form, we introduce

extra variables. In particular suppose A -> u1u2 … un where n >
2. Then we convert this to a set of rules, A -> u1A1, A1 -> u2A2,
…, An-2 -> un-1un.
• Now we need to take care of the rules with two elements on the
right hand side.
• If both of the elements are variables, then we are fine.
• But if any of them are terminals, we 2 add a new variable and a new rule
to take care of these. For example, if we have A -> u1B where u1 is a
terminal, then we replace this by A -> U1B and U -> u1.
CNF (cont)

S -> ASB
A -> aAS | a | ε
B -> SbS | A | bb
CNF (cont)

• First we add a new start symbol

S0 -> S
S -> ASB
A -> aAS | a | ε
B -> SbS | A | bb
CNF (cont)

• Next we need to eliminate the ε rules. Eliminating A -> ε yields

S0 -> S
S -> ASB | SB
A -> aAS | a | aS
B -> SbS | A | bb | ε
CNF (cont)

• Now we have a new ε rule., B -> ε. Lets remove it

S0 -> S
S -> ASB | SB | AS
A -> aAS | a | aS
B -> SbS | A | bb
CNF (cont)

• Next we need to remove all unit rules. Lets begin by removing B

-> A:
S0 -> S
S -> ASB | SB | AS
A -> aAS | a | aS
B -> SbS | bb | aAS | a | aS
CNF (cont)

• Further we can eliminate S0 -> S:

S0 -> ASB | SB | AS
S -> ASB | SB | AS
A -> aAS | a | aS
B -> SbS | bb | aAS | a | aS
CNF (cont)

• Now we need to take care of the rules with more than three
symbols. First replace S0 -> ASB by S0 -> AU1 and U1 -> SB:
S0 -> AU1 | SB | AS
S -> ASB | SB | AS
A -> aAS | a | aS
B -> SbS | bb | aAS | a | aS
U1 -> SB
CNF (cont)

• Next eliminate S -> ASB in a similar form (technically we could

reuse U1 , but lets not):
S0 -> AU1 | SB | AS
S -> AU2 | SB | AS
A -> aAS | a | aS
B -> SbS | bb | aAS | a | aS
U1 -> SB
U2 -> SB
CNF (cont)

• Onward and upward, now fix A -> aAS by introducing A -> aU3
and U3 -> AS:
S0 -> AU1 | SB | AS
S -> AU2 | SB | AS
A -> aU3 | a | aS
B -> SbS | bb | aAS | a | aS
U1 -> SB
U2 -> SB
U3 -> AS
CNF (cont)

• Finally, fix the two B -> rules:

S0 -> AU1 | SB | AS
S -> AU2 | SB | AS
A -> aU3 | a | aS
B -> SU4 | bb | aU5 | a | aS
U1 -> SB
U2 -> SB
U3 -> AS
U4 -> bS
U5 -> AS
CNF (cont)

• Finally we need to work with the rules which have terminals and variables
or two terminals. We need to introduce new variables for these. Let these
be V1 -> a and V2 -> b:
S0 -> AU1 | SB | AS
S -> AU2 | SB | AS
A -> V1U3 | a | V1S
B -> SU4 | V2V2 | V1U5 | a | V1S
U1 -> SB
U2 -> SB
U3 -> AS
U4 -> V2S
U5 -> AS
V1 -> a
V2 -> b
How to Evaluate Syntactic Trees

• Based on grammatical relations/dependencies

• Head word-modifier word dependencies
• Example
Special Sentences

• Cháy nhà.
• Nhà cháy.
• Trong túi còn tiền.
• Tiền còn trong túi.
• Trên trời lấp lánh một ngôi sao.
• Một ngôi sao lấp lánh trên trời.

APx LabVIEW Getting Started
No ratings yet
APx LabVIEW Getting Started
54 pages
Lecture 6. Contex-free Grammars
No ratings yet
Lecture 6. Contex-free Grammars
46 pages
Lecture7 PDF
No ratings yet
Lecture7 PDF
40 pages
CS372 Formal Languages & The Theory of Computation
No ratings yet
CS372 Formal Languages & The Theory of Computation
33 pages
Ambiguity in Context Free Languages
No ratings yet
Ambiguity in Context Free Languages
32 pages
CNF To GNF and GNF
No ratings yet
CNF To GNF and GNF
4 pages
Unit-3 Aim 502
No ratings yet
Unit-3 Aim 502
14 pages
Pda Annotated 10 12 2021
No ratings yet
Pda Annotated 10 12 2021
37 pages
Theory of Computation: Automata Theory (CFG, CFL, CNF)
No ratings yet
Theory of Computation: Automata Theory (CFG, CFL, CNF)
39 pages
Unit 3 (Part II)
No ratings yet
Unit 3 (Part II)
18 pages
Lecture 7 - 8 & 9 - Chapter 4
No ratings yet
Lecture 7 - 8 & 9 - Chapter 4
50 pages
TIC 2151 - Theory of Computation: Context-Free Grammars (CFG)
No ratings yet
TIC 2151 - Theory of Computation: Context-Free Grammars (CFG)
23 pages
Flat Module 3
No ratings yet
Flat Module 3
18 pages
Chapter Three
No ratings yet
Chapter Three
37 pages
theoryofautomata-210504001836
No ratings yet
theoryofautomata-210504001836
15 pages
Unit 3 CFG
No ratings yet
Unit 3 CFG
65 pages
Ambiguity + Chomsky Normal Form: November 11, 2013
No ratings yet
Ambiguity + Chomsky Normal Form: November 11, 2013
23 pages
TOC 3IS(cs)
No ratings yet
TOC 3IS(cs)
24 pages
Chapter 13
No ratings yet
Chapter 13
5 pages
Chapter 3 CFG
No ratings yet
Chapter 3 CFG
19 pages
CFG To CNF Conversion
No ratings yet
CFG To CNF Conversion
5 pages
Context-Free Grammar
No ratings yet
Context-Free Grammar
46 pages
2 Contex Free Language
No ratings yet
2 Contex Free Language
13 pages
Theory of Computation: Lecture 7: Context-Free Grammar
No ratings yet
Theory of Computation: Lecture 7: Context-Free Grammar
21 pages
Grammar
No ratings yet
Grammar
31 pages
Context Free Languages
No ratings yet
Context Free Languages
36 pages
Chapter 4 and 5
No ratings yet
Chapter 4 and 5
71 pages
Chapter 4 and 5
100% (1)
Chapter 4 and 5
71 pages
Context Free Languages: Context Free Grammars Parsing Arithmetic Expression Removing λ-productions Normal forms
No ratings yet
Context Free Languages: Context Free Grammars Parsing Arithmetic Expression Removing λ-productions Normal forms
24 pages
Lecture 10
No ratings yet
Lecture 10
24 pages
chapter4 -
No ratings yet
chapter4 -
21 pages
Chomsky's Normal Form (CNF) : Steps For Converting CFG Into CNF
No ratings yet
Chomsky's Normal Form (CNF) : Steps For Converting CFG Into CNF
3 pages
Unit - 3
No ratings yet
Unit - 3
15 pages
Lecture 12-13 - Context Free Grammars
No ratings yet
Lecture 12-13 - Context Free Grammars
41 pages
TOC II Updated
No ratings yet
TOC II Updated
41 pages
Converting CFG To Chomsky Normal Form
No ratings yet
Converting CFG To Chomsky Normal Form
5 pages
Chapter 05 - Pushdown Automata
No ratings yet
Chapter 05 - Pushdown Automata
32 pages
Lectures Examples and Solutions of CFG&RE
No ratings yet
Lectures Examples and Solutions of CFG&RE
290 pages
CFG CFG
No ratings yet
CFG CFG
16 pages
CFG To Chomsky Normal Form Transformation Module
No ratings yet
CFG To Chomsky Normal Form Transformation Module
31 pages
Context Free Grammars & Parsing: CPSC 388 Fall 2001 Ellen Walker Hiram College
No ratings yet
Context Free Grammars & Parsing: CPSC 388 Fall 2001 Ellen Walker Hiram College
14 pages
note
No ratings yet
note
3 pages
Grammar Free
No ratings yet
Grammar Free
10 pages
Unit-4 Context Free Grammar
No ratings yet
Unit-4 Context Free Grammar
106 pages
5c-partB-CFG and PDA
No ratings yet
5c-partB-CFG and PDA
57 pages
Unit 3
No ratings yet
Unit 3
36 pages
Unit 3 - Theory of Computation - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Theory of Computation - WWW - Rgpvnotes.in
14 pages
CD Unit-3
No ratings yet
CD Unit-3
146 pages
ALC Unit-3
No ratings yet
ALC Unit-3
26 pages
Normal Forms: CS154 Chris Pollett Mar 12, 2007
No ratings yet
Normal Forms: CS154 Chris Pollett Mar 12, 2007
8 pages
Jan-june 2025 Btcs 4 Sem v10 Btcs404 Btcs404 Unit3 Notes
No ratings yet
Jan-june 2025 Btcs 4 Sem v10 Btcs404 Btcs404 Unit3 Notes
14 pages
Unit - 3
No ratings yet
Unit - 3
14 pages
M CFG S For Linguists
No ratings yet
M CFG S For Linguists
25 pages
Unit-3 Part Ii
No ratings yet
Unit-3 Part Ii
15 pages
APznzaYdKSIYTRqL9iSuc8fiUYV04S8drACWbLoYY5LdefFiQBp5_C4CRMdDZ68qdLmp6K8KxCbLQ1qFw4QwVh8k0Z7W4D_pyYgJT3H3Q_SCJq9YGl9HWiMUFPfTkvAZahXUwesiE1tSLs8pmD6P2yi4qo8WokJVKOp-xeWZz_g1DQtjWAq5qQMt0g6BGMtmHqK1rOjeBxcuflq
No ratings yet
APznzaYdKSIYTRqL9iSuc8fiUYV04S8drACWbLoYY5LdefFiQBp5_C4CRMdDZ68qdLmp6K8KxCbLQ1qFw4QwVh8k0Z7W4D_pyYgJT3H3Q_SCJq9YGl9HWiMUFPfTkvAZahXUwesiE1tSLs8pmD6P2yi4qo8WokJVKOp-xeWZz_g1DQtjWAq5qQMt0g6BGMtmHqK1rOjeBxcuflq
14 pages
Chapter 05 - Pushdown Automata
No ratings yet
Chapter 05 - Pushdown Automata
31 pages
UNIT-3 PART II
No ratings yet
UNIT-3 PART II
13 pages
CH 3
No ratings yet
CH 3
16 pages
Grammar and Language: Grammar: It Is System That Specifies
No ratings yet
Grammar and Language: Grammar: It Is System That Specifies
40 pages
Context Free Language
No ratings yet
Context Free Language
31 pages
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
From Everand
AP Calculus Flashcards, Fourth Edition: Up-to-Date Review and Practice
Barron's Educational Series
No ratings yet
Digital Image Processing - Color Image Processing
No ratings yet
Digital Image Processing - Color Image Processing
19 pages
Digital Image Processing - Fundamental
No ratings yet
Digital Image Processing - Fundamental
21 pages
Icip 2000 899575
No ratings yet
Icip 2000 899575
4 pages
Digital Image Processing - Sampling Theory
No ratings yet
Digital Image Processing - Sampling Theory
56 pages
Lecture 3. Vector Semantics
No ratings yet
Lecture 3. Vector Semantics
51 pages
Digital Image Processing - Image Restoration
No ratings yet
Digital Image Processing - Image Restoration
49 pages
B. The Best Kinds of Models Are Those That Let You Chose Your Degree of Detail
No ratings yet
B. The Best Kinds of Models Are Those That Let You Chose Your Degree of Detail
4 pages
TB Ch1 4students
No ratings yet
TB Ch1 4students
4 pages
S4hana Terms+of+Payment
100% (1)
S4hana Terms+of+Payment
25 pages
Taste of Home Oct Nov 2007
No ratings yet
Taste of Home Oct Nov 2007
68 pages
Boarding School
No ratings yet
Boarding School
106 pages
Neuropsychological Test
No ratings yet
Neuropsychological Test
3 pages
Material Balance Method
No ratings yet
Material Balance Method
25 pages
Answer Sheet Week 3
No ratings yet
Answer Sheet Week 3
2 pages
CAUSTIC SODA (Powder/Beads) : Product Data Sheet (PDS)
No ratings yet
CAUSTIC SODA (Powder/Beads) : Product Data Sheet (PDS)
1 page
Facebook Whitepaper Fred Lam
No ratings yet
Facebook Whitepaper Fred Lam
12 pages
mV20 EN G5 Literacy Stations Web
No ratings yet
mV20 EN G5 Literacy Stations Web
127 pages
Iso 4967 1979
No ratings yet
Iso 4967 1979
9 pages
SAP Tutorial For ABAP Developers
No ratings yet
SAP Tutorial For ABAP Developers
10 pages
Soal Semester - BHS Inggris - Ix
No ratings yet
Soal Semester - BHS Inggris - Ix
2 pages
Be Report
No ratings yet
Be Report
43 pages
Mackie 1642 VLZ PRO Manual
No ratings yet
Mackie 1642 VLZ PRO Manual
39 pages
ME MTech 2021 Regulations-PED
No ratings yet
ME MTech 2021 Regulations-PED
17 pages
Magnetic Effect of Current - Level - 1 - DTS 1
No ratings yet
Magnetic Effect of Current - Level - 1 - DTS 1
3 pages
Penilaian Harian Ganjil B.inggris Kls 8 - 2022-2023 Fix
No ratings yet
Penilaian Harian Ganjil B.inggris Kls 8 - 2022-2023 Fix
11 pages
Industrial Scale Production of Recombinant Human Insulin Using Escherichia Coli BL-21
No ratings yet
Industrial Scale Production of Recombinant Human Insulin Using Escherichia Coli BL-21
11 pages
Ms Word Frequently Used Shortcuts
No ratings yet
Ms Word Frequently Used Shortcuts
4 pages
CPTED
No ratings yet
CPTED
4 pages
Trip Generation Analysis
No ratings yet
Trip Generation Analysis
56 pages
IPS-MBD20031-In-511C-Data Sheet of Level Gauge (Side Mounted) - A
No ratings yet
IPS-MBD20031-In-511C-Data Sheet of Level Gauge (Side Mounted) - A
8 pages
BUSTEL
No ratings yet
BUSTEL
15 pages
Goldman Sachs
No ratings yet
Goldman Sachs
20 pages
Final - Tzedakah Tikkun Olam Maimonides Ladder - Dec
No ratings yet
Final - Tzedakah Tikkun Olam Maimonides Ladder - Dec
4 pages
Scribedd
No ratings yet
Scribedd
2 pages
福若瑟逝世百年
No ratings yet
福若瑟逝世百年
95 pages
The New Rules of Sales Enablement
No ratings yet
The New Rules of Sales Enablement
34 pages
HANA IQs
No ratings yet
HANA IQs
21 pages

Lecture 6. CFGs

Uploaded by

Lecture 6. CFGs

Uploaded by

Lecture 6 – Context

• Context free grammars (CFGs)

• A grammar G is a quadruple (T , N, S, R), where

• Non-terminals: capital letters like A, B, and S

• Write a CFG for generating

A grammar derives sentences by

• The start symbol is always at the root of the tree.

• Basic structure of a Vietnamese noun phrase

• Much more complex than premodifier

• “quả bóng xanh”

• (NP (Nc quả) (N bóng) (A xanh))

• Basic structure of a Vietnamese verb phrase

• A grammar is ambiguous if it permits

• Method 1: Rewrite the grammar to make it unambiguous.

S -> NP VP | NP AP P -> chúng tôi

• Analyze the following sentences

• A useful form for dealing with context free grammars is the

• So what is Chomsky normal form?

• Next we remove the ε rule. Suppose we are removing the rule A

• At this point we have converted our CFG to one which

• To convert the remaining rules to proper form, we introduce

• First we add a new start symbol

• Next we need to eliminate the ε rules. Eliminating A -> ε yields

• Now we have a new ε rule., B -> ε. Lets remove it

• Next we need to remove all unit rules. Lets begin by removing B

• Further we can eliminate S0 -> S:

• Next eliminate S -> ASB in a similar form (technically we could

• Finally, fix the two B -> rules:

• Based on grammatical relations/dependencies

You might also like