0% found this document useful (0 votes)
69 views

Entity Relationship Diagram and Basic Database Modeling

The document provides an overview of entity relationship (ER) modeling and relational database management systems (RDBMS). It discusses what an RDBMS is, where they are used, how programming without an RDBMS can cause issues, and how a DBMS addresses these issues through its functionality including transactions, queries, storage management and more.

Uploaded by

danyalhamzah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
69 views

Entity Relationship Diagram and Basic Database Modeling

The document provides an overview of entity relationship (ER) modeling and relational database management systems (RDBMS). It discusses what an RDBMS is, where they are used, how programming without an RDBMS can cause issues, and how a DBMS addresses these issues through its functionality including transactions, queries, storage management and more.

Uploaded by

danyalhamzah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 294

Entity Relationship (ER) Modeling

Matilda Wilson
2

What Is a Relational Database


Management System ?
Database Management System = DBMS
Relational DBMS = RDBMS

• A collection of files that store the data

• A big C program written by someone else that


accesses and updates those files for you
3

Where are RDBMS used ?


• Backend for traditional “database” applications
• Backend for large Websites
• Backend for Web services
4

Example of a Traditional Database


Application

Suppose we are building a system


to store the information about:
• students
• courses
• professors
• who takes what, who teaches what
5

Can we do it without a DBMS ?


Sure we can! Start by storing the data in files:

students.txt courses.txt professors.txt

Now write C or Java programs to implement


specific tasks
6

Doing it without a DBMS...

• Enroll “Mary Johnson” in “CSE444”:

Write a C program to do the following:


Read
Read ‘students.txt’
‘students.txt’
Read
Read ‘courses.txt’
‘courses.txt’
Find&update
Find&update the
the record
record “Mary
“Mary Johnson”
Johnson”
Find&update
Find&update the
the record
record “CSE444”
“CSE444”
Write
Write “students.txt”
“students.txt”
Write
Write “courses.txt”
“courses.txt”
7

Problems without an DBMS...

• System crashes: Read ‘students.txt’


Read ‘students.txt’
Read ‘courses.txt’
Read ‘courses.txt’
Find&update the record “Mary Johnson”
Find&update the record “Mary Johnson”
Find&update the record “CSE444”
Find&update the record “CSE444”
CRASH !
Write “students.txt”
Write “students.txt”
Write “courses.txt”
Write “courses.txt”

– What is the problem ?


• Large data sets (say 50GB)
– What is the problem ?
• Simultaneous access by many users
– Need locks: we know them from OS, but now data on
disk; and is there any fun to re-implement them ?
8

Enters a DMBS

“Two tier database system”

connection
(ODBC, JDBC)

Database server
(someone else’s
Data files C program) Applications
9

Functionality of a DBMS
The programmer sees SQL, which has two
components:
• Data Definition Language - DDL
• Data Manipulation Language - DML
– query language

Behind the scenes the DBMS has:


• Query optimizer
• Query engine
• Storage management
• Transaction Management (concurrency, recovery)
1
0

Functionality of a DBMS
Two things to remember:

• Client-server architecture
– Slow, cumbersome connection
– But good for the data
• It is just someone else’s C program
– In the beginning we may be impressed by its speed
– But later we discover that it can be frustratingly slow
– We can do any particular task faster outside the
DBMS
– But the DBMS is general and convenient
1
1

How the Programmer Sees the DBMS

• Start with DDL to create tables:


CREATE
CREATETABLE TABLEStudents
Students((
Name
NameCHAR(30)
CHAR(30)
SSN
SSNCHAR(9)
CHAR(9)PRIMARY
PRIMARYKEY
KEYNOT
NOTNULL,
NULL,
Category
CategoryCHAR(20)
CHAR(20)
)) . .. .. .
• Continue with DML to populate tables:

INSERT
INSERTINTO INTOStudents
Students
VALUES(‘Charles’,
VALUES(‘Charles’,‘123456789’,
‘123456789’,‘undergraduate’)
‘undergraduate’)
.. .. .. ..
1
2

How the Programmer Sees the


DBMS
• Tables:
Students: Takes:
SSN Name Category SSN CID
123-45-6789 Charles undergrad 123-45-6789 CSE444
234-56-7890 Dan grad 123-45-6789 CSE444
… … 234-56-7890 CSE142
Courses: …
CID Name Quarter
CSE444 Databases fall
CSE541 Operating systems winter
• Still implemented as files, but behind the scenes
can be quite complex
“data independence” = separate logical view
from physical implementation
1
3

Transactions
• Enroll “Mary Johnson” in “CSE444”:
BEGIN
BEGINTRANSACTION;
TRANSACTION;
INSERT
INSERTINTO
INTOTakes
Takes
SELECT
SELECTStudents.SSN,
Students.SSN,Courses.CID
Courses.CID
FROM
FROMStudents,
Students,Courses
Courses
WHERE
WHEREStudents.name
Students.name==‘Mary
‘MaryJohnson’
Johnson’and
and
Courses.name
Courses.name==‘CSE444’
‘CSE444’
----More
Moreupdates
updateshere....
here....
IF
IFeverything-went-OK
everything-went-OK
THEN
THENCOMMIT;
COMMIT;
ELSE
ELSEROLLBACK
ROLLBACK

If system crashes, the transaction is still either committed or aborted


1
4

Transactions

• A transaction = sequence of statements that


either all succeed, or all fail
• Transactions have the ACID properties:
A = atomicity
C = consistency
I = independence
D = durability
1
5

Queries
• Find all courses that “Mary” takes
SELECT
SELECT C.name
C.name
FROM
FROM Students
Students S,
S, Takes
Takes T,
T, Courses
Courses CC
WHERE
WHERE S.name=“Mary”
S.name=“Mary” and and
S.ssn
S.ssn == T.ssn
T.ssn and
and T.cid
T.cid == C.cid
C.cid
• What happens behind the scene ?
– Query processor figures out how to answer the
query efficiently.
1
6

Queries, behind the scene

Declarative SQL query Imperative query execution plan:


sname

SELECT
SELECT C.name
C.name
FROM
FROMStudents
StudentsS,
S,Takes
TakesT,
T,Courses
CoursesCC
WHERE
WHERES.name=“Mary”
S.name=“Mary”andand cid=cid

S.ssn
S.ssn==T.ssn
T.ssnand
andT.cid
T.cid==C.cid
C.cid
sid=sid

name=“Mary”

Students Takes Courses

The optimizer chooses the best execution plan for a query


1
7

Database Systems
• The big commercial database vendors:
– Oracle
– IBM (with DB2) bought Informix recently
– Microsoft (SQL Server)
– Sybase
• Some free database systems (Unix) :
– Postgres
– Mysql
– Predator
Database Principles &
Fundamentals of Design,

DATA MODELS
OUTLINE
• What a database is, what it does, and why
database design is important
• How modern databases evolved from files and
file systems
• About flaws in file system data management
• What a DBMS is, what it does, and how it fits
into the database system
• About types of database systems and database
models
19
OUTLINE CONTINUE

• About data modeling and why data models are


important
• About the basic data-modeling building blocks
• What business rules are and how they influence
database design
• How the major data models evolved
• About emerging alternative data models and the
need they fulfill
• How data models can be classified by their level
of abstraction
20
Data versus Information
– Data constitute building blocks of information
– Data is processed to produce information
– Information reveals meaning of data
– Information is a valuable resource
– Information is basis for knowledge
– Good, timely, relevant information is critical to
decision making
– Good decision making is key to organizational
survival
21
Data-Information-Decision Cycle

Figure 16.1

22
Database Management
• A database is a shared, integrated computer
structure containing:
– Application (or end user) data
– Metadata (data about data, eg. datatype,
length, required/not required, validation, …)
• Database Management System (DBMS)
– Manages Database structure
– Controls access to data
– Provides query language
23
Advantages of DBMS
• Makes data management more efficient and
effective
• Query language allows quick answers to ad hoc
(one time) queries
• Provides easier access to better-managed data
• Promotes integrated view of organization’s
operations
• Reduces the probability of inconsistent data (same
data stored in different places with possibility of
different values)

24
DBMS Manages Interaction

Figure 1.2

25
Database Design
• Importance of Good Design
– Poor design results in unwanted data redundancy
(unnecessary duplication of data)
– Poor design generates errors leading to decisions
based on incorrect data
• Practical Approach
– Focus on principles and concepts of database
design
– Importance of logical design

26
Historical Roots of Database
• First computer applications focused on
clerical tasks (eg preparing bills)
• Requests for information (eg how many bills
were not paid this month) quickly followed
• File systems developed to address needs
– Data organized according to expected use
– Data Processing (DP) specialists
computerized manual file systems
27
File Terminology
• Data
– Raw Facts
• Field
– Group of characters with specific meaning
• Record
– Logically connected fields that describe a
person, place, or thing
• File
– Collection of related records 28
Simple File System

Figure 1.5

29
File System Disadvantages
• File System Data Management
– Requires extensive programming in third-
generation language (3GL)
– Time consuming
– Makes ad hoc queries impossible
– Leads to islands of information

30
File System Critique (con’t.)
• Data Dependence
– Change in file’s data characteristics requires
modification of data access programs
– Must tell program what to do and how
– Makes file systems cumbersome from
programming and data management views
• Structural Dependence
– Change in file structure requires modification
of related programs
31
File System Critique (con’t.)
• Field Definitions and Naming Conventions
– Flexible record definition anticipates reporting
requirements
– Selection of proper field names important
– Attention to length of field names
– Use of unique record identifiers

32
File System Critique (con’t.)
• Data Redundancy
– Different and conflicting versions of same data
– Results of uncontrolled data redundancy
• Data anomalies
– Modification
– Insertion
– Deletion
• Data inconsistency
– Lack of data integrity
33
Database Systems
• Database consists of logically related data
stored in a single repository
• Provides advantages over file system
management approach
– Eliminates inconsistency, data anomalies, data
dependency, and structural dependency
problems
– Stores data structures, relationships, and
access paths in addition to application data 34
Database vs. File Systems
Figure 1.6

35
Database System Environment

Figure 1.7

36
Database System Types

Databases can be differentiated by


different factors including:
• Single-user vs. Multiuser Database
– Desktop
– Workgroup
– Enterprise
• Centralized vs. Distributed
• Use
– Production or transactional
– Decision support or data warehouse 37
DBMS Functions
• Data dictionary management
• Data storage management
• Data transformation and presentation
• Security management
• Multiuser access control
• Backup and recovery management
• Data integrity management
• Database language and application
programming interfaces
• Database communication interfaces 38
Database Models
• Collection of logical constructs used to
represent data structure and relationships
within the database
– Conceptual models: logical nature of data
representation
– Implementation models: emphasis on how the
data are represented in the database

39
Implementation Database Models

– Hierarchical (first), example: IMS


– Network (next), example: IDMS
– Relational (next), examples: Oracle, DB2
– Object Oriented (latest)

40
Hierarchical Database Model
• Logically represented by an upside down
tree
– Each parent can have many children
– Each child has only one parentFigure 1.8

41
Hierarchical Database Model
• Advantages
– Conceptual simplicity
– Database security and integrity
– Data independence
– Efficiency
• Disadvantages
– Complex implementation
– Difficult to manage and lack of standards
– Lacks structural independence
– Applications programming and use complexity
– Implementation limitations 42
Network Database Model
• Each record can have multiple parents
– Composed of sets
– Each set has owner record and member
record
– Member may have several owners

Figure
1.10 43
Network Database Model
• Advantages
– Conceptual simplicity
– Handles more relationship types
– Data access flexibility
– Promotes database integrity
– Data independence
– Conformance to standards
• Disadvantages
– System complexity
– Lack of structural independence 44
Relational Database Model

• Most widely used model today


• Perceived by user as a collection of tables
for data storage
• Tables are a series of row/column
intersections
• Tables related by sharing common entity
characteristic(s)

45
Relational Database Model (con’t.)

Figure 1.11

46
Relational Database Model
• Advantages
– Structural independence
– Improved conceptual simplicity
– Easier database design, implementation,
management, and use
– Ad hoc query capability with SQL
– Powerful database management system

47
Relational Database Model
• Disadvantages
– Substantial hardware and system software
overhead
– Poor design and implementation is made
easy
– May promote “islands of information”
problems

48
Entity Relationship Database Model
• Complements the relational data model
concepts
• Represented in an entity relationship
diagram (ERD)
• Based on entities, attributes, and
relationships

Figure 1.13

49
Entity Relationship Database Model
• Advantages
– Exceptional conceptual simplicity
– Visual representation
– Effective communication tool
– Integrated with the relational database model
• Disadvantages
– Limited constraint representation
– Limited relationship representation
– No data manipulation language
– Loss of information content

50
Design Principle Introduction

• Designers, programmers, and end users see


data in different ways
• Different views of same data lead to designs
that do not reflect organization’s operation
• Data modeling reduces complexities of
database design
• Various degrees of data abstraction help
reconcile varying views of same data

51
Data Modeling and Data Models

• Data models
– Relatively simple representations of complex
real-world data structures
• Often graphical
• Model: an abstraction of a real-world object or
event
– Useful in understanding complexities of the real-
world environment
• Data modeling is iterative and progressive

52
The Importance of Data Models

• Facilitate interaction among the designer, the


applications programmer, and the end user
• End users have different views and needs for
data
• Data model organizes data for various users
• Data model is an abstraction
– Cannot draw required data out of the data model

53
Data Model Basic Building Blocks

• Entity: anything about which data are to be


collected and stored
• Attribute: a characteristic of an entity
• Relationship: describes an association among
entities
– One-to-many (1:M) relationship
– Many-to-many (M:N or M:M) relationship
– One-to-one (1:1) relationship
• Constraint: a restriction placed on the data
54
Business Rules
• Descriptions of policies, procedures, or principles
within a specific organization
– Apply to any organization that stores and uses data
to generate information
• Description of operations to create/enforce actions
within an organization’s environment
– Must be in writing and kept up to date
– Must be easy to understand and widely
disseminated
• Describe characteristics of data as viewed by the
company 55
Discovering Business Rules

• Sources of business rules:


– Company managers
– Policy makers
– Department managers
– Written documentation
• Procedures
• Standards
• Operations manuals
– Direct interviews with end users 56
Discovering Business Rules (cont’d.)

• Standardize company’s view of data


• Communications tool between users and
designers
• Allow designer to understand the nature, role,
and scope of data
• Allow designer to understand business
processes
• Allow designer to develop appropriate
relationship participation rules and constraints
57
Translating Business Rules into Data
Model Components
• Nouns translate into entities
• Verbs translate into relationships among
entities
• Relationships are bidirectional
• Two questions to identify the relationship type:
– How many instances of B are related to one
instance of A?
– How many instances of A are related to one
instance of B?
58
Naming Conventions

• Naming occurs during translation of business


rules to data model components
• Names should make the object unique and
distinguishable from other objects
• Names should also be descriptive of objects in
the environment and be familiar to users
• Proper naming:
– Facilitates communication between parties
– Promotes self-documentation
59
The Evolution of Data Models

60
Hierarchical and Network Models

• The hierarchical model


– Developed in the 1960s to manage large
amounts of data for manufacturing projects
– Basic logical structure is represented by an
upside-down “tree”
– Structure contains levels or segments

61
Hierarchical and Network Models
(cont’d.)
• Network model
– Created to represent complex data relationships
more effectively than the hierarchical model
– Improves database performance
– Imposes a database standard
– Resembles hierarchical model
• Record may have more than one parent

62
Hierarchical and Network Models
(cont’d.)
– Collection of records in 1:M relationships
– Set composed of two record types:
• Owner
• Member
• Network model concepts still used today:
– Schema
• Conceptual organization of entire database as
viewed by the database administrator
– Subschema
• Database portion “seen” by the application programs

63
Hierarchical and Network Models
(cont’d.)
– Data management language (DML)
• Defines the environment in which data can be
managed
– Data definition language (DDL)
• Enables the administrator to define the schema
components

64
The Relational Model

• Developed by E.F. Codd (IBM) in 1970


• Table (relations)
– Matrix consisting of row/column intersections
– Each row in a relation is called a tuple
• Relational models were considered impractical
in 1970
• Model was conceptually simple at expense of
computer overhead

65
The Relational Model (cont’d.)

• Relational data management system (RDBMS)


– Performs same functions provided by
hierarchical model
– Hides complexity from the user
• Relational diagram
– Representation of entities, attributes, and
relationships
• Relational table stores collection of related
entities
66
68
The Relational Model (cont’d.)

• SQL-based relational database application


involves three parts:
– End-user interface
• Allows end user to interact with the data
– Set of tables stored in the database
• Each table is independent from another
• Rows in different tables are related based on
common values in common attributes
– SQL “engine”
• Executes all queries

69
The Entity Relationship Model

• Widely accepted standard for data modeling


• Introduced by Chen in 1976
• Graphical representation of entities and their
relationships in a database structure
• Entity relationship diagram (ERD)
– Uses graphic representations to model database
components
– Entity is mapped to a relational table

70
The Entity Relationship Model (cont’d.)

• Entity instance (or occurrence) is row in table


• Entity set is collection of like entities
• Connectivity labels types of relationships
• Relationships are expressed using Chen
notation
– Relationships are represented by a diamond
– Relationship name is written inside the diamond
• Crow’s Foot notation used as design standard
in this book
71
72
The Object-Oriented (OO) Model

• Data and relationships are contained in a single


structure known as an object
• OODM (object-oriented data model) is the basis
for OODBMS
– Semantic data model
• An object:
– Contains operations
– Are self-contained: a basic building-block for
autonomous structures
– Is an abstraction of a real-world entity
73
The Object-Oriented (OO) Model
(cont’d.)
• Attributes describe the properties of an object
• Objects that share similar characteristics are
grouped in classes
• Classes are organized in a class hierarchy
• Inheritance: object inherits methods and
attributes of parent class
• UML based on OO concepts that describe
diagrams and symbols
– Used to graphically model a system
74
Object/Relational and XML

• Extended relational data model (ERDM)


– Semantic data model developed in response to
increasing complexity of applications
– Includes many of OO model’s best features
– Often described as an object/relational database
management system (O/RDBMS)
– Primarily geared to business applications

76
Object/Relational and XML (cont’d.)

• The Internet revolution created the potential to


exchange critical business information
• In this environment, Extensible Markup
Language (XML) emerged as the de facto
standard
• Current databases support XML
– XML: the standard protocol for data exchange
among systems and Internet services

77
Data Models: A Summary

• Common characteristics:
– Conceptual simplicity with semantic
completeness
– Represent the real world as closely as possible
– Real-world transformations must comply with
consistency and integrity characteristics
• Each new data model capitalized on the
shortcomings of previous models
• Some models better suited for some tasks

79
80
Degrees of Data Abstraction

• Database designer starts with abstracted view,


then adds details
• ANSI Standards Planning and Requirements
Committee (SPARC)
– Defined a framework for data modeling based
on degrees of data abstraction (1970s):
• External
• Conceptual
• Internal

81
The External Model

• End users’ view of the data environment


• ER diagrams represent external views
• External schema: specific representation of an
external view
– Entities
– Relationships
– Processes
– Constraints

82
83
The External Model (cont’d.)

• Easy to identify specific data required to


support each business unit’s operations
• Facilitates designer’s job by providing feedback
about the model’s adequacy
• Ensures security constraints in database design
• Simplifies application program development

84
The Conceptual Model

• Represents global view of the entire database


• All external views integrated into single global
view: conceptual schema
• ER model most widely used
• ERD graphically represents the conceptual
schema

85
86
The Conceptual Model (cont’d.)

• Provides a relatively easily understood macro


level view of data environment
• Independent of both software and hardware
– Does not depend on the DBMS software used to
implement the model
– Does not depend on the hardware used in the
implementation of the model
– Changes in hardware or software do not affect
database design at the conceptual level

87
The Internal Model

• Representation of the database as “seen” by


the DBMS
– Maps the conceptual model to the DBMS
• Internal schema depicts a specific
representation of an internal model
• Depends on specific database software
– Change in DBMS software requires internal
model be changed
• Logical independence: change internal model
without affecting conceptual model
88
89
The Physical Model

• Operates at lowest level of abstraction


– Describes the way data are saved on storage
media such as disks or tapes
• Requires the definition of physical storage and
data access methods
• Relational model aimed at logical level
– Does not require physical-level details
• Physical independence: changes in physical
model do not affect internal model
90
91
Summary

• A data model is an abstraction of a complex


real-world data environment
• Basic data modeling components:
– Entities
– Attributes
– Relationships
– Constraints
• Business rules identify and define basic
modeling components
92
Summary (cont’d.)

• Hierarchical model
– Set of one-to-many (1:M) relationships between
a parent and its children segments
• Network data model
– Uses sets to represent 1:M relationships
between record types
• Relational model
– Current database implementation standard
– ER model is a tool for data modeling
• Complements relational model
93
Summary (cont’d.)
• Object-oriented data model: object is basic
modeling structure
• Relational model adopted object-oriented
extensions: extended relational data model
(ERDM)
• OO data models depicted using UML
• Data-modeling requirements are a function of
different data views and abstraction levels
– Three abstraction levels: external, conceptual,
and internal

94
Object-Oriented Database Model
• Objects or abstractions of real-world entities
are stored
– Attributes describe properties
– Collection of similar objects is a class
• Methods represent real world actions of classes
• Classes are organized in a class hierarchy
– Inheritance is ability of object to inherit
attributes and methods of classes above it

95
OO Data Model
• Advantages
– Adds semantic content
– Visual presentation includes semantic content
– Database integrity
– Both structural and data independence
• Disadvantages
– Lack of OODM
– Complex navigational data access
– Steep learning curve
– High system overhead slows transactions 96
Database Models and the Internet
• Characteristics of “Internet age” databases
– Flexible, efficient, and secure Internet access
– Easily used, developed, and supported
– Supports complex data types and
relationships
– Seamless interfaces with multiple data
sources and structures
– Simplicity of conceptual database model
– Many database design, implementation, and
application development tools
– Powerful DBMS GUI make DBA job easier
97
What is a model?

• A model is a simplified way to describe or


explain a complex reality
• A model helps people communicate and work
simply yet effectively when talking about and
manipulating complex real-world phenomena

98
Importance of Data Models

99
Obsolete models:
Hierarchical and network models

100
The Relational Model
• Uses key concepts from mathematical relations (tables)
– “Relational” in “relational model” means “tables” (mathematical relations),
not “relationships”
• Table (relations)
– Intersections of
• rows (various data types) and
• columns (same data type)
• Relations have well defined methods (queries) for combining their
data members
– Selecting (reading) and joining (combining) data is defined based on
mathematical principles
• Relational data management system (RDBMS)
– Relations were originally too advanced for 1970s computing power
– As computing power increased, simplicity of the model prevailed

101
The Entity Relationship Model
• Enhancement of the relational model
– Relations (tables) become entities
– Very detailed specification of relationships and their properties
• Entity relationship diagram (ERD)
– Uses graphic representations to model database components
• Many variations for notation exist
– In this class, we use the Crow’s Foot notation

102
Summary of
Data models
• A data model is an abstract way of thinking
about how data is organized
• Although the relational model has become the
dominant data model, it cannot solve all
database challenges
• The Object-Oriented Data Model is useful for
complex data coupled with object-oriented
programming

103
Objectives

• In this Slides, students will learn:


– The main characteristics of entity relationship
components
– How relationships between entities are defined,
refined, and incorporated into the database
design process
– How ERD components affect database design
and implementation
– That real-world database design often requires
the reconciliation of conflicting goals
104
The Entity Relationship Model (ERM)

• ER model forms the basis of an ER diagram


• ERD represents conceptual database as
viewed by end user
• ERDs depict database’s main components:
– Entities
– Attributes
– Relationships

105
Entities

• Refers to entity set and not to single entity


occurrence
• Corresponds to table and not to row in
relational environment
• In Chen and Crow’s Foot models, entity is
represented by rectangle with entity’s name
• The entity name, a noun, is written in capital
letters

106
Attributes

• Characteristics of entities
• Chen notation: attributes represented by ovals
connected to entity rectangle with a line
– Each oval contains the name of attribute it
represents
• Crow’s Foot notation: attributes written in
attribute box below entity rectangle

107
108
Attributes (cont’d.)

• Required attribute: must have a value


• Optional attribute: may be left empty
• Domain: set of possible values for an attribute
– Attributes may share a domain
• Identifiers: one or more attributes that uniquely
identify each entity instance
• Composite identifier: primary key composed of
more than one attribute

109
110
Attributes (cont’d.)

• Composite attribute can be subdivided


• Simple attribute cannot be subdivided
• Single-value attribute can have only a single
value
• Multivalued attributes can have many values

111
112
Attributes (cont’d.)

• M:N relationships and multivalued attributes


should not be implemented
– Create several new attributes for each of the
original multivalued attributes’ components
– Create new entity composed of original
multivalued attributes’ components
• Derived attribute: value may be calculated from
other attributes
– Need not be physically stored within database

113
114
Relationships

• Association between entities


• Participants are entities that participate in a
relationship
• Relationships between entities always operate
in both directions
• Relationship can be classified as 1:M
• Relationship classification is difficult to establish
if only one side of the relationship is known

115
Connectivity and Cardinality

• Connectivity
– Describes the relationship classification
• Cardinality
– Expresses minimum and maximum number of
entity occurrences associated with one
occurrence of related entity
• Established by very concise statements known
as business rules

116
117
Existence Dependence

• Existence dependence
– Entity exists in database only when it is
associated with another related entity
occurrence
• Existence independence
– Entity can exist apart from one or more related
entities
– Sometimes such an entity is referred to as a
strong or regular entity

118
Relationship Strength

• Weak (non-identifying) relationships


– Exists if PK of related entity does not contain PK
component of parent entity
• Strong (identifying) relationships
– Exists when PK of related entity contains PK
component of parent entity

119
120
121
Weak Entities

• Weak entity meets two conditions


– Existence-dependent
– Primary key partially or totally derived from
parent entity in relationship
• Database designer determines whether an
entity is weak based on business rules

122
123
124
Relationship Participation

• Optional participation
– One entity occurrence does not require
corresponding entity occurrence in particular
relationship
• Mandatory participation
– One entity occurrence requires corresponding
entity occurrence in particular relationship

125
126
127
Relationship Degree

• Indicates number of entities or participants


associated with a relationship
• Unary relationship
– Association is maintained within single entity
• Binary relationship
– Two entities are associated
• Ternary relationship
– Three entities are associated

128
129
130
Recursive Relationships

• Relationship can exist between occurrences of


the same entity set
– Naturally found within unary relationship

131
132
133
Associative (Composite) Entities

• Also known as bridge entities


• Used to implement M:N relationships
• Composed of primary keys of each of the
entities to be connected
• May also contain additional attributes that play
no role in connective process

134
135
136
Developing an ER Diagram
• Database design is an iterative process
– Create detailed narrative of organization’s
description of operations
– Identify business rules based on description of
operations
– Identify main entities and relationships from
business rules
– Develop initial ERD
– Identify attributes and primary keys that adequately
describe entities
– Revise and review ERD
137
138
139
140
Summary (cont’d.)

• Connectivities and cardinalities are based on


business rules
• M:N relationship is valid at conceptual level
– Must be mapped to a set of 1:M relationships
• ERDs may be based on many different ERMs
• UML class diagrams are used to represent the
static data structures in a data model
• Database designers are often forced to make
design compromises
141
142
143
144
145
146
147
148
149
Database Design Challenges:
Conflicting Goals
• Database designers must make design
compromises
– Conflicting goals: design standards, processing
speed, information requirements
• Important to meet logical requirements and
design conventions
• Design is of little value unless it delivers all
specified query and reporting requirements
• Some design and implementation problems do
not yield “clean” solutions
150
151
Summary

• Entity relationship (ER) model


– Uses ERD to represent conceptual database as
viewed by end user
– ERM’s main components:
• Entities
• Relationships
• Attributes
– Includes connectivity and cardinality notations

152
Summary

• Entity relationship (ER) model


– Uses ERD to represent conceptual database as
viewed by end user
– ERM’s main components:
• Entities
• Relationships
• Attributes
– Includes connectivity and cardinality notations

153
Chapter Outline
• Overview of Database Design Process
• Example Database Application (COMPANY)
• ER Model Concepts
– Entities and Attributes
– Entity Types, Value Sets, and Key Attributes
– Relationships and Relationship Types
– Weak Entity Types
– Roles and Attributes in Relationship Types
• ER Diagrams - Notation
• ER Diagram for COMPANY Schema
• Alternative Notations – UML class diagrams, others

154
Overview of Database Design Process

• Two main activities:


– Database design
– Applications design
• Focus in this course on database design
– To design the conceptual schema for a database
application
• Applications design focuses on the programs
and interfaces that access the database
– Generally considered part of software
engineering
155
Overview of Database Design Process

156
Example COMPANY Database
• We need to create a database schema design
based on the following (simplified) requirements
of the COMPANY Database:
– The company is organized into DEPARTMENTs.
Each department has a name, number and an
employee who manages the department. We keep
track of the start date of the department manager.
A department may have several locations.
– Each department controls a number of PROJECTs.
Each project has a unique name, unique number
and is located at a single location.

157
Example COMPANY Database
(Contd.)
– We store each EMPLOYEE’s social security
number, address, salary, sex, and birthdate.
• Each employee works for one department but
may work on several projects.
• We keep track of the number of hours per week
that an employee currently works on each project.
• We also keep track of the direct supervisor of
each employee.
– Each employee may have a number of
DEPENDENTs.
• For each dependent, we keep track of their name,
sex, birthdate, and relationship to the employee.
158
ER Model Concepts
• Entities and Attributes
– Entities are specific objects or things in the mini-world that are represented in
the database.
• For example the EMPLOYEE John Smith, the
Research DEPARTMENT, the ProductX
PROJECT
– Attributes are properties used to describe an entity.
• For example an EMPLOYEE entity may have the
attributes Name, SSN, Address, Sex, BirthDate
– A specific entity will have a value for each of its attributes.
• For example a specific employee entity may have
Name='John Smith', SSN='123456789', Address
='731, Fondren, Houston, TX', Sex='M',
BirthDate='09-JAN-55‘
– Each attribute has a value set (or data type) associated with it – e.g. integer,
string, subrange, enumerated type, …
159
Types of Attributes (1)
• Simple
– Each entity has a single atomic value for the attribute. For example, SSN or Sex.
• Composite
– The attribute may be composed of several components. For example:
• Address(Apt#, House#, Street, City, State, ZipCode, Country), or
• Name(FirstName, MiddleName, LastName).
• Composition may form a hierarchy where some
components are themselves composite.
• Multi-valued
– An entity may have multiple values for that attribute. For example, Color of a CAR or
PreviousDegrees of a STUDENT.
• Denoted as {Color} or {PreviousDegrees}.

160
Types of Attributes (2)

• In general, composite and multi-valued


attributes may be nested arbitrarily to any
number of levels, although this is rare.
– For example, PreviousDegrees of a STUDENT
is a composite multi-valued attribute denoted by
{PreviousDegrees (College, Year, Degree,
Field)}
– Multiple PreviousDegrees values can exist
– Each has four subcomponent attributes:
• College, Year, Degree, Field
161
Example of a composite attribute

162
Entity Types and Key Attributes (1)
• Entities with the same basic attributes are grouped or
typed into an entity type.
– For example, the entity type EMPLOYEE and PROJECT.
• An attribute of an entity type for which each entity must
have a unique value is called a key attribute of the
entity type.
– For example, SSN of EMPLOYEE.

163
Entity Types and Key Attributes (2)

• A key attribute may be composite.


– VehicleTagNumber is a key of the CAR entity type with
components (Number, State).
• An entity type may have more than one key.
– The CAR entity type may have two keys:
• VehicleIdentificationNumber (popularly called
VIN)
• VehicleTagNumber (Number, State), aka license
plate number.
• Each key is underlined
164
Displaying an Entity type
• In ER diagrams, an entity type is displayed in a
rectangular box
• Attributes are displayed in ovals
– Each attribute is connected to its entity type
– Components of a composite attribute are
connected to the oval representing the
composite attribute
– Each key attribute is underlined
– Multivalued attributes displayed in double ovals
• See CAR example on next slide
165
Entity Type CAR with two keys and a corresponding Entity
Set

166
Entity Set

• Each entity type will have a collection of entities


stored in the database
– Called the entity set
• Previous slide shows three CAR entity
instances in the entity set for CAR
• Same name (CAR) used to refer to both the
entity type and the entity set
• Entity set is the current state of the entities of
that type that are stored in the database
167
Initial Design of Entity Types for the
COMPANY Database Schema

• Based on the requirements, we can identify four


initial entity types in the COMPANY database:
– DEPARTMENT
– PROJECT
– EMPLOYEE
– DEPENDENT
• Their initial design is shown on the following
slide
• The initial attributes shown are derived from the
requirements description 168
Initial Design of Entity Types:
EMPLOYEE, DEPARTMENT, PROJECT, DEPENDENT

169
Refining the initial design by introducing relationships

• The initial design is typically not complete


• Some aspects in the requirements will be
represented as relationships
• ER model has three main concepts:
– Entities (and their entity types and entity sets)
– Attributes (simple, composite, multivalued)
– Relationships (and their relationship types and
relationship sets)
• We introduce relationship concepts next
170
Relationships and Relationship Types (1)

• A relationship relates two or more distinct entities with a specific


meaning.
– For example, EMPLOYEE John Smith works on the ProductX PROJECT, or
EMPLOYEE Franklin Wong manages the Research DEPARTMENT.
• Relationships of the same type are grouped or typed into a relationship
type.
– For example, the WORKS_ON relationship type in which EMPLOYEEs and
PROJECTs participate, or the MANAGES relationship type in which EMPLOYEEs
and DEPARTMENTs participate.
• The degree of a relationship type is the number of participating entity
types.
– Both MANAGES and WORKS_ON are binary relationships.

171
Relationship instances of the WORKS_FOR N:1
relationship between EMPLOYEE and DEPARTMENT

172
Relationship instances of the M:N WORKS_ON
relationship between EMPLOYEE and PROJECT

173
Relationship type vs. relationship set (1)

• Relationship Type:
– Is the schema description of a relationship
– Identifies the relationship name and the
participating entity types
– Also identifies certain relationship constraints
• Relationship Set:
– The current set of relationship instances
represented in the database
– The current state of a relationship type

174
Relationship type vs. relationship set (2)

• Previous figures displayed the relationship sets


• Each instance in the set relates individual
participating entities – one from each
participating entity type
• In ER diagrams, we represent the relationship
type as follows:
– Diamond-shaped box is used to display a
relationship type
– Connected to the participating entity types via
straight lines
175
Refining the COMPANY database schema by introducing
relationships

• By examining the requirements, six relationship types are identified


• All are binary relationships( degree 2)
• Listed below with their participating entity types:
– WORKS_FOR (between EMPLOYEE, DEPARTMENT)
– MANAGES (also between EMPLOYEE, DEPARTMENT)
– CONTROLS (between DEPARTMENT, PROJECT)
– WORKS_ON (between EMPLOYEE, PROJECT)
– SUPERVISION (between EMPLOYEE (as subordinate), EMPLOYEE (as
supervisor))
– DEPENDENTS_OF (between EMPLOYEE, DEPENDENT)

176
ER DIAGRAM – Relationship Types are:
WORKS_FOR, MANAGES, WORKS_ON, CONTROLS, SUPERVISION, DEPENDENTS_OF

177
Discussion on Relationship Types
• In the refined design, some attributes from the initial entity types are
refined into relationships:
– Manager of DEPARTMENT -> MANAGES
– Works_on of EMPLOYEE -> WORKS_ON
– Department of EMPLOYEE -> WORKS_FOR
– etc
• In general, more than one relationship type can exist between the same
participating entity types
– MANAGES and WORKS_FOR are distinct relationship types between
EMPLOYEE and DEPARTMENT
– Different meanings and different relationship instances.

178
Recursive Relationship Type
• An relationship type whose with the same participating entity type in
distinct roles
• Example: the SUPERVISION relationship
• EMPLOYEE participates twice in two distinct roles:
– supervisor (or boss) role
– supervisee (or subordinate) role
• Each relationship instance relates two distinct EMPLOYEE entities:
– One employee in supervisor role
– One employee in supervisee role

179
Weak Entity Types
• An entity that does not have a key attribute
• A weak entity must participate in an identifying relationship type with an owner or
identifying entity type
• Entities are identified by the combination of:
– A partial key of the weak entity type
– The particular entity they are related to in the identifying entity type
• Example:
– A DEPENDENT entity is identified by the dependent’s first name, and the specific
EMPLOYEE with whom the dependent is related
– Name of DEPENDENT is the partial key
– DEPENDENT is a weak entity type
– EMPLOYEE is its identifying entity type via the identifying relationship type
DEPENDENT_OF

180
Constraints on Relationships
• Constraints on Relationship Types
– (Also known as ratio constraints)
– Cardinality Ratio (specifies maximum participation)

• One-to-one (1:1)
• One-to-many (1:N) or Many-to-one (N:1)
• Many-to-many (M:N)
– Existence Dependency Constraint (specifies minimum participation) (also
called participation constraint)
• zero (optional participation, not existence-
dependent)
• one or more (mandatory participation, existence-
dependent)
181
Many-to-one (N:1) Relationship

182
Many-to-many (M:N) Relationship

183
Displaying a recursive relationship

• In a recursive relationship type.


– Both participations are same entity type in different roles.
– For example, SUPERVISION relationships between
EMPLOYEE (in role of supervisor or boss) and (another)
EMPLOYEE (in role of subordinate or worker).
• In following figure, first role participation labeled
with 1 and second role participation labeled
with 2.
• In ER diagram, need to display role names to
distinguish participations.

184
A Recursive Relationship
Supervision`

185
Recursive Relationship Type is: SUPERVISION
(participation role names are shown)

186
Attributes of Relationship types
• A relationship type can have attributes:
– For example, HoursPerWeek of WORKS_ON
– Its value for each relationship instance describes
the number of hours per week that an
EMPLOYEE works on a PROJECT.
• A value of HoursPerWeek depends on a
particular (employee, project) combination
– Most relationship attributes are used with M:N
relationships
• In 1:N relationships, they can be transferred to
the entity type on the N-side of the relationship
187
Example Attribute of a Relationship
Type:
Hours of WORKS_ON

188
Notation for Constraints on
Relationships
• Cardinality ratio (of a binary relationship): 1:1,
1:N, N:1, or M:N
– Shown by placing appropriate numbers on the
relationship edges.
• Participation constraint (on each participating
entity type): total (called existence dependency)
or partial.
– Total shown by double line, partial by single line.
• NOTE: These are easy to specify for Binary
Relationship Types.

189
Alternative (min, max) notation for
relationship structural constraints:
• Specified on each participation of an entity type E in a relationship type R
• Specifies that each entity e in E participates in at least min and at most max relationship
instances in R
• Default(no constraint): min=0, max=n (signifying no limit)
• Must have minmax, min0, max 1
• Derived from the knowledge of mini-world constraints
• Examples:
– A department has exactly one manager and an employee can manage at most one
department.
• Specify (0,1) for participation of EMPLOYEE in MANAGES
• Specify (1,1) for participation of DEPARTMENT in MANAGES
– An employee can work for exactly one department but a department can have any
number of employees.
• Specify (1,1) for participation of EMPLOYEE in WORKS_FOR
• Specify (0,n) for participation of DEPARTMENT in WORKS_FOR

190
The (min,max) notation for relationship
constraints

Read the min,max numbers next to the entity


type and looking away from the entity type

191
COMPANY ER Schema Diagram using (min,
max) notation

192
Alternative diagrammatic notation

• ER diagrams is one popular example for


displaying database schemas
• Many other notations exist in the literature and
in various database design and modeling tools
• Appendix A illustrates some of the alternative
notations that have been used
• UML class diagrams is representative of
another way of displaying ER concepts that is
used in several commercial design tools

193
Summary of notation for ER diagrams

194
UML class diagrams

• Represent classes (similar to entity types) as large rounded boxes with


three sections:
– Top section includes entity type (class) name
– Second section includes attributes
– Third section includes class operations (operations are not in basic ER model)
• Relationships (called associations) represented as lines connecting the
classes
– Other UML terminology also differs from ER terminology
• Used in database design and object-oriented software design
• UML has many other types of diagrams for software design (see Chapter
12)

195
UML class diagram for COMPANY database schema

196
Other alternative diagrammatic notations

197
Relationships of Higher Degree

• Relationship types of degree 2 are called binary


• Relationship types of degree 3 are called
ternary and of degree n are called n-ary
• In general, an n-ary relationship is not
equivalent to n binary relationships
• Constraints are harder to specify for higher-
degree relationships (n > 2) than for binary
relationships

198
Discussion of n-ary relationships (n > 2)

• In general, 3 binary relationships can represent different information than a


single ternary relationship (see Figure 3.17a and b on next slide)
• If needed, the binary and n-ary relationships can all be included in the
schema design (see Figure 3.17a and b, where all relationships convey
different meanings)
• In some cases, a ternary relationship can be represented as a weak entity
if the data model allows a weak entity type to have multiple identifying
relationships (and hence multiple owner entity types) (see Figure 3.17c)

199
Example of a ternary relationship

200
Discussion of n-ary relationships (n > 2)

• If a particular binary relationship can be derived


from a higher-degree relationship at all times,
then it is redundant
• For example, the TAUGHT_DURING binary
relationship in Figure 3.18 (see next slide) can
be derived from the ternary relationship
OFFERS (based on the meaning of the
relationships)

201
Another example of a ternary relationship

202
Displaying constraints on higher-degree relationships

• The (min, max) constraints can be displayed on the edges – however, they
do not fully describe the constraints
• Displaying a 1, M, or N indicates additional constraints
– An M or N indicates no constraint
– A 1 indicates that an entity can participate in at most one relationship instance
that has a particular combination of the other participating entities
• In general, both (min, max) and 1, M, or N are needed to describe fully the
constraints

203
Data Modeling Tools
• A number of popular tools that cover conceptual modeling and mapping
into relational schema design.
– Examples: ERWin, S- Designer (Enterprise Application Suite), ER- Studio, etc.
• POSITIVES:
– Serves as documentation of application requirements, easy user interface -
mostly graphics editor support
• NEGATIVES:
– Most tools lack a proper distinct notation for relationships with relationship
attributes
– Mostly represent a relational design in a diagrammatic form rather than a
conceptual ER-based design
(See Chapter 12 for details)

204
Some of the Currently Available Automated Database
Design Tools
COMPANY TOOL FUNCTIONALITY
Embarcadero ER Studio Database Modeling in ER and IDEF1X
Technologies
DB Artisan Database administration, space and security management

Oracle Developer 2000/Designer 2000 Database modeling, application development


Popkin Software System Architect 2001 Data modeling, object modeling, process modeling,
structured analysis/design
Platinum Enterprise Modeling Suite: Erwin, Data, process, and business component modeling
(Computer BPWin, Paradigm Plus
Associates)
Persistence Inc. Pwertier Mapping from O-O to relational model

Rational (IBM) Rational Rose UML Modeling & application generation in C++/JAVA
Resolution Ltd. Xcase Conceptual modeling up to code maintenance
Sybase Enterprise Application Suite Data modeling, business logic modeling
Visio Visio Enterprise Data modeling, design/reengineering Visual Basic/C++

205
Extended Entity-Relationship (EER)
Model (in next chapter)

• The entity relationship model in its original form


did not support the specialization and
generalization abstractions
• Next chapter illustrates how the ER model can
be extended with
– Type-subtype and set-subset relationships
– Specialization/Generalization Hierarchies
– Notation to display them in EER diagrams

206
Summary

• ER Model Concepts: Entities, attributes,


relationships
• Constraints in the ER model
• Using ER in step-by-step conceptual schema
design for the COMPANY database
• ER Diagrams - Notation
• Alternative Notations – UML class diagrams,
others

207
Entity Integrity:
Selecting Primary Keys

• Primary key is the most important


characteristic of an entity
– Single attribute or some combination of
attributes
• Primary keys and foreign keys work
together to implement relationships
• Properly selecting primary key has direct
bearing on efficiency and effectiveness

208
Natural Keys and Primary
• Keys
Natural key is a real-world identifier used to uniquely
identify real-world objects
– Familiar to end users and forms part of their day-to-day
business vocabulary
• Generally, data modeler uses natural identifier as
primary key of entity being modeled
• May instead use composite primary key or surrogate
key
– Surrogate key - a PK created to simplify the
identification of entity instances
• Has no meaning, exists only to distinguish one entity
from another (e.g., Autonumber)

209
Primary Key Guidelines

• Attribute that uniquely identifies entity instances


in an entity set
– Could also be combination of attributes
• Main function is to uniquely identify an entity
instance or row within a table
• Guarantee entity integrity, not to “describe”
the entity
• Primary keys and foreign keys implement
relationships among entities
– Behind the scenes, hidden from user

210
211
When to Use Composite
Primary Keys
• Composite primary keys useful in two
cases:
– As identifiers of composite entities
• In which each primary key combination is
allowed once in M:N relationship
– As identifiers of weak entities
• In which weak entity has a strong identifying
relationship with the parent entity
• Automatically provides benefit of ensuring
that there cannot be duplicate values

212
Composite PK of
ENROLL ensures a
student can not register
for the same class
twice 213
When to Use Composite
Primary Keys
• When used as identifiers of weak entities
normally used to represent:
– Real-world object that is existent-dependent on
another real-world object
– Real-world object that is represented in data
model as two separate entities in strong
identifying relationship
• Dependent entity exists only when it is related to
parent entity
– EMPLOYEE and DEPENDENT – latter uses a
composite PK containing employee id
– LINE exists only as part of INVOICE

214
When To Use Surrogate Primary Keys

• Especially helpful when there is:


– No natural key
– Selected candidate key has embedded semantic
contents
– Selected candidate key is too long or
cumbersome

215
When To Use Surrogate
Primary Keys
• If you use surrogate key:
– Ensure that candidate key of entity
in question performs properly
– Use “unique index” and “not null”
constraints

216
When To Use Surrogate
Primary Keys

• A catering hall has a number of rooms it rents for small parties


– What should be the PK? DATE,TIME_START,ROOM
– What if, in addition to the room, additional equipment were to be used
from the RESOUCE table
– Composite key for linking table would be DATE,TIME_START,ROOM,
RSC_ID
• Quite lengthy
• If that became FK in another entity, it would be quite complex to maintain
– Instead use numeric, single attribute surrogate key

217
Design Cases:
Learning Flexible Database Design
• Data modeling and design requires skills
acquired through experience
• Experience acquired through practice
• Four special design cases that highlight:
– Importance of flexible design
– Proper identification of primary keys
– Placement of foreign keys

218
Design Case 1: Implementing 1:1
Relationships
• Foreign keys work with primary keys to properly
implement relationships in relational model
• Put primary key of the “one” side on the “many”
side as foreign key
– Primary key: parent entity
– Foreign key: dependent entity

219
Design Case 1: Implementing
1:1 Relationships
• In 1:1 relationship, there are two
options:
– Place a foreign key in both entities
(not recommended)
– Place a foreign key in one of the
entities
• Primary key of one of the two
entities appears as foreign key of
other

220
221
Design Case 2: Maintaining History of
Time-Variant Data
• Normally, existing attribute values are replaced
with new value without regard to previous value
• Time-variant data:
– Values change over time
– Must keep a history of data changes
• Keeping history of time-variant data equivalent to
having a multivalued attribute in your entity
• Must create new entity in 1:M relationships with
original entity
• New entity contains new value, date of change

222
223
224
Design Case 3: Fan Traps

• Design trap occurs when relationship is


improperly or incompletely identified
– Represented in a way not consistent with the
real world
• Most common design trap is known as fan trap
• Fan trap occurs when one entity is in two 1:M
relationships to other entities
– Produces an association among other entities
not expressed in the model

225
226
227
Design Case 4:
Redundant Relationships
• Redundancy is seldom a good thing in database
environment
• Occurs when there are multiple relationship paths
between related entities
• Main concern is that redundant relationships
remain consistent across model
• Some designs use redundant relationships to
simplify the design
• In the following example, the relationship between
DIVISION and PLAYER is not needed as all
information can be obtained through TEAM

228
229
230
Portion of Tiny College ERD

231
Tiny College - New Requirement

• Tiny College wants to keep track of the history of all


administrative appointments (date of appointment and
date of termination).
• The Tiny College chancellor may want to know how
many deans worked in the College of Business
between January 1, 1960 and January 1, 2010 or who
the dean of the College of Education was in 1990.
Given that information, create the complete ERD
containing all primary keys, foreign keys, and main
attributes.

232
Tiny College ERD

233
Matilda Wilson
Database Fundamentals

Normalization of Database Tables

234
Objectives
• In this chapter, students will learn:
– What normalization is and what role it plays in
the database design process
– About the normal forms 1NF, 2NF, 3NF, BCNF,
and 4NF
– How normal forms can be transformed from
lower normal forms to higher normal forms
– That normalization and ER modeling are used
concurrently to produce a good database design
– That some situations require denormalization to
generate information efficiently
235
Database Tables and Normalization

• Normalization
– Process for evaluating and correcting table
structures to minimize data redundancies
• Reduces data anomalies
– Series of stages called normal forms:
• First normal form (1NF)
• Second normal form (2NF)
• Third normal form (3NF)

236
Database Tables and Normalization

• Normalization (continued)
– 2NF is better than 1NF; 3NF is better than 2NF
– For most business database design purposes,
3NF is as high as needed in normalization
– Highest level of normalization is not always most
desirable
• Denormalization produces a lower normal form
– Increased performance but greater data
redundancy

237
The Need for Normalization
• Example: company that manages building projects
(Figure 6.1)
– Each project has its own project number, name,
assigned employees, etc.
– Each employee has an employee number, name, job
class
– Charges its clients by billing hours spent on each
contract
– Hourly billing rate is dependent on employee’s position
– Total charge is a derived attribute and not stored in the
table
– Periodically, report is generated that contains
information such as displayed in Table 6.1
238
239
240
The Need for Normalization
• Structure of data set in Figure 6.1 does not handle
data very well
• Table structure appears to work; report is generated
with ease
• Report may yield different results depending on
what data anomaly has occurred
– Employee can be assigned to more than one project
but each project includes only a single occurrence of
any one employee
• Relational database environment is suited to help
designer avoid data integrity problems
241
The Need for Normalization
• PROJECT_NUM, either a PK or part of a PK, contains
NULLS
• JOB_CLASS values could be abbreviated differently
• Each time an employee is assigned to a project, all
employee information is duplicated
• Update anomalies – Modifying JOB_CLASS for employee
105 requires alterations in two records
• Insertion anomalies – to insert a new employee who has not
been assigned to a project requires a phantom project
• Deletion anomalies – If a project has only one employee
associated with it and that employee leaves, a phantom
employee must be created
242
The Normalization Process
• Each table represents a single subject
• No data item will be unnecessarily stored in more
than one table
• All nonprime attributes in a table are dependent
on the primary key
• Each table is void of insertion, update, and
deletion anomalies

243
The Normalization Process (cont’d.)
• Objective of normalization is to ensure that all
tables are in at least 3NF
• Higher forms are not likely to be encountered in
business environment
• Normalization works one relation at a time
• Progressively breaks table into new set of
relations based on identified dependencies

244
245
The Normalization Process (cont’d.)
• Partial dependency
– Exists when there is a functional dependence in
which the determinant is only part of the primary key
– If (A,B)(C,D); BC and (A,B) is the PK
• BC is a partial dependency because only part of the
PK, B, is needed to determine the value of C
• Transitive dependency
– Exists when there are functional dependencies such
that X → Y, Y → Z, and X is the primary key
• XZ is a transitive dependency because X determines
the value of Z via Y
• The existence of a functional dependence among non-
prime attributes is a sign of transitive dependency 246
Conversion to First Normal Form

• Repeating group
– Group of multiple entries of same type can exist
for any single key attribute occurrence
• Relational table must not contain repeating
groups
• Normalizing table structure will reduce data
redundancies
• Normalization is three-step procedure

247
Conversion to First Normal Form
(cont’d.)
• Step 1: Eliminate the Repeating Groups
– Eliminate nulls: each repeating group attribute
contains an appropriate data value
• Step 2: Identify the Primary Key
– Must uniquely identify attribute value
– New key must be composed
• Step 3: Identify All Dependencies
– Dependencies are depicted with a diagram

248
249
Conversion to First Normal Form
(cont’d.)

• Dependency diagram:
– Depicts all dependencies found within given
table structure
– Helpful in getting bird’s-eye view of all
relationships among table’s attributes
250

– Makes it less likely that you will overlook an


as neither attribute is a prime attribute

251
Conversion to First Normal Form
• First normal form describes tabular format:
– All key attributes are defined
– No repeating groups in the table
– All attributes are dependent on primary key
• All relational tables satisfy 1NF requirements
• Some tables contain partial dependencies
– Dependencies are based on part of the primary
key
– Should be used with caution

252
Conversion to Second Normal Form
• Conversion to 2NF occurs only when the 1NF has a composite key
– If the 1NF key is a single attribute, then the table is automatically in
2NF
• Step 1: Make New Tables to Eliminate Partial Dependencies
– For each component of the PK that acts as a determinant in a partial
dependency, create a new table with a copy of that component as the
PK
– These components also remain in the original table in order to serve
as FKs to the original table
– Write each key component on a separate line; then write the original
composite key on the last line. Each component will become the key
in a new table
PROJ_NUM
EMP_NUM
PROJ_NUM EMP_NUM
253

Conversion to Second Normal Form
Step 2: Reassign Corresponding Dependent Attributes
– The dependencies for the original key components are found
by examining the arrows below the dependency diagram in
Fig 6.3
– The attributes in a partial dependency are removed from the
original table and placed in the new table with the
dependency’s determinant
– Any attributes that are not dependent in a partial
dependency remain in the original table
– At this point, most anomalies have been eliminated
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM, EMP_NAME, JOB_CLASS, CHG_HOUR)
ASSIGNMENT(PROJ_NUM , EMP_NUM, ASSIGN_HOURS)
254
255
Conversion to Second Normal Form

• Table is in second normal form (2NF) when:


– It is in 1NF and
– It includes no partial dependencies:
• No attribute is dependent on only portion of
primary key

256
Conversion to Third Normal Form

• Step 1: Make New Tables to Eliminate


Transitive Dependencies
– For every transitive dependency, write its
determinant as PK for new table (JOB_CLASS)
• Determinant: any attribute whose value
determines other values within a row
– The determinant should remain in the original
table to serve as a FK

257
Conversion to Third Normal Form
• Step 2: Reassign Corresponding Dependent
Attributes
– Identify attributes dependent on each determinant
identified in Step 1
• Identify dependency
– Name table to reflect its contents and function
PROJECT(PROJ_NUM, PROJ_NAME)
ASSIGNMENT(PROJ_NUM , EMP_NUM, ASSIGN_HOURS)
EMPLOYEE(EMP_NUM, EMP_NAME, JOB_CLASS)
JOB(JOB_CLASS, CHG_HOUR)

258
259
Conversion to Third Normal Form
• A table is in third normal form (3NF)
when both of the following are true:
– It is in 2NF
– It contains no transitive dependencies

260
Conversion to Third Normal Form
1NF->2NF – remove partial dependencies
2NF->3NF – remove transitive dependencies
• In both cases, the answer is create a new
table
– The determinant of the problem dependency
remains in the original table and is placed as
the PK of the new table
– The dependents of the problem dependency
are removed from the original table and
placed as nonprime attributes in the new table
261
Improving the Design
• Table structures should be cleaned up to eliminate
initial partial and transitive dependencies
• Normalization cannot, by itself, be relied on to
make good designs
• Valuable because it helps eliminate data
redundancies
• If a table has multiple candidate keys and one is a
composite key, there can be partial dependencies
even when the PK is a single attribute
– Resolve in 3NF as a transitive dependency

262
Improving the Design (cont’d.)
• Issues to address, in order, to produce a good
normalized set of tables:
– Evaluate PK Assignments
• Use JOB_CODE as PK for JOB table rather than
JOB_CLASS to avoid data-entry errors when
used as a FK in EMPLOYEE (DB Designer
/Database Designer)
• JOB (JOB_CODE, JOB_CLASS,CHG_HOUR)
• Why is JOB_CLASS-->CHG_HOUR not a
transitive dependency? (Because JOB_CLASS is
a candidate key)

263
Improving the Design (cont’d.)
– Evaluate Naming Conventions
• CHG_HOUR should be JOB_CHG_HOUR
• JOB_DESCRIPTION is a better than
JOB_CLASS
– Refine Attribute Atomicity
• Atomic attribute – one that can not be further
subdivided
– EMP_NAME is not atomic
– Identify New Attributes
• YTD gross salary, social security payments, hire
date

264
Improving the Design (cont’d.)
– Identify New Relationships
• To track the manager of each project, put
EMP_NUM as a FK in PROJECT
– Refine Primary Keys as Required for Data
Granularity
• What does ASSIGN_HOURS represent ? Yearly
total hours, weekly, daily?
• If need multiple daily entries for project and emp
number, then use a surrogate key ASSIGN_NUM to
avoid duplication of the PK key EMP_NUM,
PROJ_NUM, ASSIGN_DATE

265
Improving the Design (cont’d.)
– Maintain Historical Accuracy
• An employee’s job charge could change over the
lifetime of a project. In order to reconstruct the
charges to a project, another field with the job
charge and date active is required
– Evaluate Using Derived Attributes
• Store rather than derive the charge if it will speed up
reporting

266
267
268
Surrogate Key Considerations

• When primary key is considered to be


unsuitable, designers use surrogate keys
• Data entries in Table 6.4 are inappropriate
because they duplicate existing records
– No violation of entity or referential integrity

269
Higher-Level Normal Forms

• Tables in 3NF perform suitably in business


transactional databases
• Higher-order normal forms are useful on
occasion
• Two special cases of 3NF:
– Boyce-Codd normal form (BCNF)
– Fourth normal form (4NF)

270
The Boyce-Codd Normal Form

• Every determinant in table is a candidate key


– Has same characteristics as primary key, but for
some reason, not chosen to be primary key
• When table contains only one candidate key,
the 3NF and the BCNF are equivalent
• BCNF can be violated only when table contains
more than one candidate key

271
The Boyce-Codd Normal Form

• Most designers consider the BCNF as a special


case of 3NF
• Table is in 3NF when it is in 2NF and there are
no transitive dependencies
• Table can be in 3NF and fail to meet BCNF
– No partial dependencies, nor does it contain
transitive dependencies
– A nonkey attribute is the determinant of a key
attribute

272
273
274
275
Fourth Normal Form (4NF)

• Table is in fourth normal form (4NF) when both


of the following are true:
– It is in 3NF
– No multiple sets of multivalued dependencies
• 4NF is largely academic if tables conform to
following two rules:
– All attributes dependent on primary key,
independent of each other
– No row contains two or more multivalued facts
about an entity
276
277
278
Normalization and Database Design

• Normalization should be part of the design


process
• Make sure that proposed entities meet required
normal form before table structures are created
• Many real-world databases have been
improperly designed or burdened with
anomalies
• You may be asked to redesign and modify
existing databases
279
Normalization and Database Design

• ER diagram
– Identify relevant entities, their attributes, and
their relationships
– Identify additional entities and attributes
• Normalization procedures
– Focus on characteristics of specific entities
– Micro view of entities within ER diagram
• Difficult to separate normalization process from
ER modeling process
280
Normalization and Database Design
• Given the following business rules:
– The company manages many projects
– Each project requires the services of many employees
– An employee may be assigned to several projects
– Some employees are not assigned to a project and perform
non-project related duties. Some employees are part of a
labor pool and shared by all project teams
– Each employee has a single primary job classification which
determines the hourly billing rate]
– Many employees can have the same job classification.

281
Normalization and Database Design
• We initially define the following entities
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM,EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_DESCRIPTION, JOB_CHG_HOUR)
• PROJECT is in 3NF and needs no modification
• EMPLOYEE contains a transitive dependency so we now have
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM,EMP_LNAME, EMP_FNAME, EMP_INITIAL, JOB_CODE)
JOB(JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)

282
Normalization and Database Design
• EMPLOYEE contains a transitive dependency so we now have
PROJECT(PROJ_NUM, PROJ_NAME)
EMPLOYEE(EMP_NUM,EMP_LNAME, EMP_FNAME, EMP_INITIAL,
JOB_CODE)
JOB(JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)

283
Normalization and Database Design
• To represent the M:N relationship between EMPLOYEE and
PROJECT, we could try two 1:M realtionships
• An employee can be assigned to many projects
• Each project can have many employees assigned to it

284
Normalization and Database Design
• As this M:N can not be implemented, we include the ASSIGNMENT
entity to track the assignment of employees in projects

285
Normalization and Database Design
• ASSIGN_HOURS is assigned to ASSIGNMENT
• A “manages” relationship is added to in order to keep detailed
information about each project’s manager
• Some additional attributes are added to maintain additional
information

PROJECT(PROJ_NUM, PROJ_NAME,EMP_NUM)

EMPLOYEE(EMP_NUM,EMP_LNAME, EMP_FNAME, EMP_INITIAL,


EMP_HIREDATE, JOB_CODE)

JOB(JOB_CODE, JOB_DESCRIPTION, JOB_CHG_HOUR)

ASSIGNMENT(ASSIGN_NUM, ASSIGN_DATE, PROJ_NUM, EMP_NUM,


ASSIGN_HOURS, ASSIGN_CHG_HOUR, ASSIGN_CHARGE)

286
287
Denormalization

• Creation of normalized relations is important


database design goal
• Processing requirements should also be a goal
• If tables are decomposed to conform to
normalization requirements:
– Number of database tables expands

288
Denormalization (cont’d.)
• Joining the larger number of tables reduces system
speed
• Conflicts are often resolved through compromises
that may include denormalization
• Defects of unnormalized tables:
– Data updates are less efficient because tables are
larger
– Indexing is more cumbersome as there are more
fields per table
– No simple strategies for creating virtual tables known
as views
289
Denormalization

290
Denormalization
• In order to generate the report below, a temporary
denormalized table is used since the last four
semesters of each faculty member could be
different due to sabbatical, leave, start date, etc

291
• EVALDATA is theDenormalization
master data table which is normalized
• FACHIST is created via a series of queries in order to
produce the desired report

292
Data-Modeling Checklist

• Data modeling translates specific real-world


environment into data model
– Represents real-world data, users, processes,
interactions
• Data-modeling checklist helps ensure that data-
modeling tasks are successfully performed
• Based on concepts and tools learned in Part II

293
294

You might also like