Database Management System - CS3492 - Question Bank and Important 2 Marks Questions With Answer (1)
Database Management System - CS3492 - Question Bank and Important 2 Marks Questions With Answer (1)
CSE
Home Mech
e
EEE
ECE
Physics
Basic for Engineering
Electrical and Data Structure
Problem Solving and Science Engineering
Electronics
Python Programming Object Oriented
Programming in C
Programming
Elective-Management
Professional Elective II
Professional Elective IV
www.Poriyaan.in
Department of CSE
Vision of Institution
To build Jeppiaar Engineering College as an Institution of Academic Excellence in Technical education and
Management education and to become a World Class University.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 1
www.Poriyaan.in
Department of CSE
Mission of Institution
To equip students with values, ethics and life skills needed to enrich their lives and
M3
enable them to meaningfully contribute to the progress of society
M4 To prepare students for higher studies and lifelong learning, enrich them with the
practical and entrepreneurial skills necessary to excel as future professionals and
contribute to Nation’s economy
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 2
www.Poriyaan.in
Department of CSE
Vision of Department
To emerge as a globally prominent department, developing ethical computer professionals, innovators and
entrepreneurs with academic excellence through quality education and research.
Mission of Department
To create computer professionals with an ability to identify and formulate the
M1
engineering problems and also to provide innovative solutions through effective
teaching learning process.
M2 To strengthen the core-competence in computer science and engineering and to create
an ability to interact effectively with industries.
M3 To produce engineers with good professional skills, ethical values and life skills for the
betterment of the society.
To interpret real-time problems with analytical skills and to arrive at cost effective and
PSO2 optimal solution using advanced tools and techniques.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 3
www.Poriyaan.in
Department of CSE
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 4
www.Poriyaan.in
Department of CSE
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 5
www.Poriyaan.in
Department of CSE
3. Network Model – Network Model is same as hierarchical model except that it has graph-like structure rather
than a tree-based structure. Unlike hierarchical model, this model allows each record to have more than one
parent record.
Physical Data Models – These models describe data at the lowest level of abstraction.
Three Schema Architecture
The goal of the three schema architecture is to separate the user applications and the physical database. The schemas
can be defined at the following levels:
1. The internal level – has an internal schema which describes the physical storage structure of the database.
Uses a physical data model and describes the complete details of data storage and access paths for the
database.
2. The conceptual level – has a conceptual schema which describes the structure of the database for users. It
hides the details of the physical storage structures, and concentrates on describing entities, data types,
relationships, user operations and constraints. Usually a representational data model is used to describe the
conceptual schema.
3. The External or View level – includes external schemas or user vies. Each external schema describes the
part of the database that a particular user group is interested in and hides the rest of the database from that
user group. Represented using the representational data model.
The three schema architecture is used to visualize the schema levels in a database. The three schemas are only
descriptions of data, the data only actually exists is at the physical level.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 6
www.Poriyaan.in
Department of CSE
COMPONENTS OF DBMS
Database Users
Users are differentiated by the way they expect to interact with the system
• Application programmers
• Sophisticated users
• Naïve users
• Database Administrator
• Specialized users etc,.
Application programmers:
Professionals who write application programs and using these application programs they interact with
the database system
Sophisticated users :
These user interact with the database system without writing programs, But they submit queries to
retrieve the information
Specialized users:
Who write specialized database applications to interact with the database system.
Naïve users:
Interacts with the database system by invoking some application programs that have been written
previously by application programmers
Eg : people accessing database over the web
Database Administrator:
Coordinates all the activities of the database system; the database administrator has a good understanding of
the enterprise’s information resources and needs.
Schema definition
Access method definition
Schema and physical organization modification
Granting user authority to access the database
Monitoring performance
Storage Manager
The Storage Manager include these following components/modules
Authorization Manager
Transaction Manager
File Manager
Buffer Manager
Storage manager is a program module that provides the interface between the low-level data stored in the
database and the application programs and queries submitted to the system.
The storage manager is responsible to the following tasks:
interaction with the file manager
efficient storing, retrieving and updating of data
Authorization Manager
Checks whether the user is an authorized person or not
Test the satisfaction of integrity constraints
Transaction Manager
Responsible for concurrent transaction execution It ensures that the database remains in a consistent state
despite of the system failure
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 7
www.Poriyaan.in
Department of CSE
EVOLUTION OF RDBMS
Before the acceptance of Codd’s Relational Model, database management systems was just an ad hoc collection of
data designed to solve a particular type of problem, later extended to solve more basic purposes. This led to complex
systems, which were difficult to understand, install, maintain and use. These database systems were plagued with the
following problems:
• They required large budgets and staffs of people with special skills that were in short supply.
• Database administrators’ staff and application developers required prior preparation to access these database
systems.
• End-user access to the data was rarely provided.
• These database systems did not support the implementation of business logic as a DBMS responsibility.
Hence, the objective of developing a relational model was to address each and every one of the shortcomings that
plagued those systems that existed at the end of the 1960s decade, and make DBMS products more widely appealing
to all kinds of users.
The existing relational database management systems offer powerful, yet simple solutions for a wide variety of
commercial and scientific application problems. Almost every industry uses relational systems to store, update and
retrieve data for operational, transaction, as well as decision support systems.
RELATIONAL DATABASE
A relational database is a database system in which the database is organized and accessed according to the
relationships between data items without the need for any consideration of physical orientation and relationship.
Relationships between data items are expressed by means of tables.
It is a tool, which can help you store, manage and disseminate information of various kinds. It is a collection of
objects, tables, queries, forms, reports, and macros, all stored in a computer program all of which are inter-related.
It is a method of structuring data in the form of records, so that relations between different entities and attributes can
be used for data access and transformation.
RELATIONAL DATABASE MANAGEMENT SYSTEM
A Relational Database Management System (RDBMS) is a system, which allows us to perceive data as tables (and
nothing but tables), and operators necessary to manipulate that data are at the user’s disposal.
Features of an RDBMS
The features of a relational database are as follows:
The ability to create multiple relations (tables) and enter data into them
An interactive query language
Retrieval of information stored in more than one table
Provides a Catalog or Dictionary, which itself consists of tables ( called system tables )
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 8
www.Poriyaan.in
stored. It is also called a relation or an entity.
• Row: Rows represent collection of data required for a particular entity. In order to identify each row as
unique there should be a unique identifier called the primary key, which allows no duplicate rows. For
example in a library every member is unique and hence is given a membership number, which uniquely
identifies each member. A row is also called a record or a tuple.
• Column: Columns represent characteristics or attributes of an entity. Each attribute maps onto a column of a
table. Hence, a column is also known as an attribute.
• Relationship: Relationships represent a logical link between two tables. A relationship is depicted by a
foreign key column.
• Degree: number of attributes
• Cardinality: number of tuples
• An attribute of an entity has a particular value. The set of possible values That a given attribute can
have is called its domain.
KEYS AND THEIR USE
Key: An attribute or set of attributes whose values uniquely identify each entity in an entity set is called a key for
that entity set.
Super Key: If we add additional attributes to a key, the resulting combination would still uniquely identify an
instance of the entity set. Such augmented keys are called super keys.
Primary Key: It is a minimum super key.
It is a unique identifier for the table (a column or a column combination with the property that at any given time no
two rows of the table contain the same value in that column or column combination).
Foreign Key: A foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another
table. In simpler words, the foreign key is defined in a second table, but it refers to the primary key in the first table.
Candidate Key: There may be two or more attributes or combinations of attributes that uniquely identify an
instance of an entity set. These attributes or combinations of attributes are called candidate keys.
Secondary Key: A secondary key is an attribute or combination of attributes that may not be a candidate key, but
that classifies the entity set on a particular characteristic. Any key consisting of a single attribute is called a simple
key, while that consisting of a combination of attributes is called a composite key.
Referential Integrity
Referential Integrity can be defined as an integrity constraint that specifies that the value (or existence) of an
attribute in one relation depend on the value (or existence) of an attribute in the same or another relation. Referential
integrity in a relational database is consistency between coupled tables. It is usually enforced by the combination of
a primary key and a foreign key. For referential integrity to hold, any field in a table that is declared a foreign key
can contain only values from a parent table's primary key field. For instance, deleting a record that contains a value
referred to by a foreign key in another table would break referential integrity.
Relational Model
Relational data model is the primary data model, which is used widely around the world for data storage and
processing. This model is simple and it has all the properties and capabilities required to process data with storage
efficiency.
Concepts
Tables − In relational data model, relations are saved in the format of Tables. This format stores the relation among
entities. A table has rows and columns, where rows represents records and columns represent the attributes.
Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
Relation instance − A finite set of tuples in the relational database system represents relation instance. Relation
instances do not have duplicate tuples.
Relation schema − A relation schema describes the relation name (table name), attributes, and their names.
Relation key − Each row has one or more attributes, known as relation key, which can identify the row in the
relation (table) uniquely.
Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.
Constraints
Every relation has some conditions that must hold for it to be a valid relation. These conditions are
called Relational Integrity Constraints. There are three main integrity constraints −
Key constraints
Domain constraints
Referential integrity constraints
Key Constraints
There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This
minimal subset of attributes is called keyfor that relation. If there are more than one such minimal subsets, these are
called candidate keys.
Key constraints force that −
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 9
www.Poriyaan.in
in a relation with a key attribute, no two tuples can have identical values for key attributes.
a key attribute can not have NULL values.
Key constraints are also referred to as Entity Constraints.
Domain Constraints
Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The same
constraints have been tried to employ on the attributes of a relation. Every attribute is bound to have a specific
range of values. For example, age cannot be less than zero and telephone numbers cannot contain a digit outside 0-
Referential integrity Constraints
Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a relation
that can be referred in other relation.
Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, then
that key element must exist.
Relational database systems are expected to be equipped with a query language that can assist its users to query the
database instances. There are two kinds of query languages − relational algebra and relational calculus.
Relational Algebra
Relational algebra is a procedural query language, which takes instances of relations as input and yields instances
of relations as output. It uses operators to perform queries. An operator can be either unary or binary. They accept
relations as their input and yield relations as their output. Relational algebra is performed recursively on a relation
and intermediate results are also considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set different
Cartesian product
Rename
We will discuss all these operations in the following sections.
Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula which may use
connectors like and, or, and not. These terms may use relational operators like − =, ≠, ≥, < , >, ≤.
For example −
σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books published after
2010.
Project Operation (∏)
It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r)
Where A1, A2 , An are attribute names of relation r.
Duplicate rows are automatically eliminated, as relation is a set.
For example −
∏subject, author (Books)
Selects and projects columns named as subject and author from the relation Books.
Union Operation (𝖴)
It performs binary union between two given relations and is defined as −
r 𝖴 s = { t | t ∈ r or t ∈ s}
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
r, and s must have the same number of attributes.
Attribute domains must be compatible.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 10
www.Poriyaan.in
Duplicate tuples are automatically eliminated.
∏ author (Books) 𝖴 ∏ author (Articles)
Output − Projects the names of the authors who have either written a book or an article or both.
Set Difference (−)
The result of set difference operation is tuples, which are present in one relation but are not in the second relation.
Notation − r − s
Finds all the tuples that are present in r but not in s.
∏ author (Books) − ∏ author (Articles)
Output − Provides the name of authors who have written books but not articles.
Cartesian Product (Χ)
Combines information of two different relations into one.
Notation − r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.
Rename Operation (ρ)
The results of relational algebra are also relations but without any name. The rename operation allows us to rename
the output relation. 'rename' operation is denoted with small Greek letter rho ρ.
Notation − ρ x (E)
Where the result of expression E is saved with name of x.
Additional operations are −
Set intersection
Assignment
Natural join
SQL FUNDAMENTALS:
SQL is a standard computer language for accessing and manipulating databases.
What is SQL?
SQL stands for Structured Query Language
SQL allows you to access a database
SQL is an ANSI standard computer language
SQL can execute queries against a database
SQL can retrieve data from a database
SQL can insert new records in a database
SQL can delete records from a database
SQL can update records in a database
SQL is easy to learn
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 11
www.Poriyaan.in
SQL Queries
With SQL, we can query a database and have a result set returned.
A query like this:
SELECT LastName FROM Persons
Gives a result set like this:
LastName
Hansen
Svendson
Pettersen
Note: Some database systems require a semicolon at the end of the SQL statement. We don't use the semicolon in
our tutorials.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 12
www.Poriyaan.in
SELECT * FROM Persons
Result
LastName FirstName Address City
Hansen Ola Timoteivn 10 Sandnes
Svendson Tove Borgvn 23 Sandnes
Pettersen Kari Storgt 20 Stavanger
"Orders" table
Company OrderNumber
Sega 3412
W3Schools 2312
Trio 4678
W3Schools 6798
Result
Company
Sega
W3Schools
Trio
W3Schools
Note that "W3Schools" is listed twice in the result-set.
To select only DIFFERENT values from the column named "Company" we use a SELECT DISTINCT statement
like this:
SELECT DISTINCT Company FROM Orders
Result:
Company
Sega
W3Schools
Trio
Now "W3Schools" is listed only once in the result-set.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 13
www.Poriyaan.in
The WHERE clause is used to specify a selection criterion.
Using Quotes
Note that we have used single quotes around the conditional values in the examples.
SQL uses single quotes around text values (most database systems will also accept double quotes). Numeric values
should not be enclosed in quotes.
For text values:
This is correct:
SELECT * FROM Persons WHERE FirstName='Tove'
This is wrong:
SELECT * FROM Persons WHERE FirstName=Tove
For numeric values:
This is correct:
SELECT * FROM Persons WHERE Year>1965
This is wrong:
SELECT * FROM Persons WHERE Year>'1965'
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 14
www.Poriyaan.in
SELECT column FROM table
WHERE column LIKE pattern
A "%" sign can be used to define wildcards (missing letters in the pattern) both before and after the pattern.
Using LIKE
The following SQL statement will return persons with first names that start with an 'O':
SELECT * FROM Persons
WHERE FirstName LIKE 'O%'
The following SQL statement will return persons with first names that end with an 'a':
SELECT * FROM Persons
WHERE FirstName LIKE '%a'
The following SQL statement will return persons with first names that contain the pattern 'la':
SELECT * FROM Persons
WHERE FirstName LIKE '%la%'
The INSERT INTO Statement
The INSERT INTO statement is used to insert new rows into a table.
Syntax
INSERT INTO table_name
VALUES (value1, value2, ... )
You can also specify the columns for which you want to insert data:
INSERT INTO table_name (column1, column2,. )
VALUES (value1, value2, ... )
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 15
www.Poriyaan.in
UPDATE table_name
SET column_name = new_value
WHERE column_name = some_value
Person:
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Storgt 67
Person:
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
Rasmussen Nina Stien 12 Stavanger
Delete
Drop
Delete a Row
"Nina Rasmussen" is going to be deleted:
DELETE FROM Person WHERE LastName = 'Rasmussen'
Result
LastName FirstName Address City
Nilsen Fred Kirkegt 56 Stavanger
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 16
www.Poriyaan.in
Sort the Rows
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 17
www.Poriyaan.in
VARP(column)
Aggregate functions in SQL Server
Function Description
AVG(column) Returns the average value of a column
BINARY_CHECKSUM
CHECKSUM
CHECKSUM_AGG
COUNT(column) Returns the number of rows (without a NULL value) of a column
COUNT(*) Returns the number of selected rows
COUNT(DISTINCT column) Returns the number of distinct results
FIRST(column) Returns the value of the first record in a specified field (not supported in
SQLServer2K)
LAST(column) Returns the value of the last record in a specified field (not supported in
SQLServer2K)
MAX(column) Returns the highest value of a column
MIN(column) Returns the lowest value of a column
STDEV(column)
STDEVP(column)
SUM(column) Returns the total sum of a column
VAR(column)
VARP(column)
Scalar functions
Scalar functions operate against a single value, and return a single value based on the input value.
Useful Scalar Functions in MS Access
Function Description
UCASE(c) Converts a field to upper case
LCASE(c) Converts a field to lower case
MID(c,start[,end]) Extract characters from a text field
LEN(c) Returns the length of a text field
INSTR(c,char) Returns the numeric position of a named character within a text field
LEFT(c,number_of_char) Return the left part of a text field requested
RIGHT(c,number_of_char) Return the right part of a text field requested
ROUND(c,decimals) Rounds a numeric field to the number of decimals specified
MOD(x,y) Returns the remainder of a division operation
GROUP BY Example
This "Sales" Table:
Company Amount
W3Schools 5500
IBM 4500
W3Schools 7100
And This SQL:
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 18
www.Poriyaan.in
SELECT Company, SUM(Amount) FROM Sales
Returns this result:
Company SUM(Amount)
W3Schools 17100
IBM 17100
W3Schools 17100
The above code is invalid because the column returned is not part of an aggregate. A GROUP BY clause will solve
this problem:
SELECT Company,SUM(Amount) FROM Sales
GROUP BY Company
HAVING…
HAVING... was added to SQL because the WHERE keyword could not be used against aggregate functions (like
SUM), and without HAVING... it would be impossible to test for result conditions.
The syntax for the HAVING function is:
SELECT column,SUM(column) FROM table
GROUP BY column
HAVING SUM(column) condition value
This "Sales" Table:
Company Amount
W3Schools 5500
IBM 4500
W3Schools 7100
This SQL:
SELECT Company,SUM(Amount) FROM Sales
GROUP BY Company
HAVING SUM(Amount)>10000
Returns this result
Company SUM(Amount)
W3Schools 12600
EMBEDDED SQL
Embedded SQL is a method of inserting inline SQL statements or queries into the code of a programming language,
which is known as a host language. Because the host language cannot parse SQL, the inserted SQL is parsed by an
embedded SQL preprocessor.
Embedded SQL is a robust and convenient method of combining the computing power of a programming language
with SQL's specialized data management and manipulation capabilities.
Structure of embedded SQL
Structure of embedded SQL defines step by step process of establishing a connection with DB and executing the
code in the DB within the high level language.
Connection to DB
This is the first step while writing a query in high level languages. First connection to the DB that we are accessing
needs to be established. This can be done using the keyword CONNECT. But it has to precede with ‘EXEC SQL’ to
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 19
www.Poriyaan.in
indicate that it is a SQL statement.
EXEC SQL CONNECT db_name;
EXEC SQL CONNECT HR_USER; //connects to DB HR_USER
Once connection is established with DB, we can perform DB transactions. Since these DB transactions are
dependent on the values and variables of the host language. Depending on their values, query will be written and
executed. Similarly, results of DB query will be returned to the host language which will be captured by the
variables of host language. Hence we need to declare the variables to pass the value to the query and get the values
from query. There are two types of variables used in the host language.
Host variable : These are the variables of host language used to pass the value to the query as well as to
capture the values returned by the query. Since SQL is dependent on host language we have to use
variables of host language and such variables are known as host variable. But these host variables should
be declared within the SQL area or within SQL code. That means compiler should be able to differentiate
it from normal C variables. Hence we have to declare host variables within BEGIN DECLARE and END
DECLARE section. Again, these declare block should be enclosed within EXEC SQL and ‘;’.
EXEC SQL BEGIN DECLARE SECTION;
int STD_ID;
char STD_NAME [15];
char ADDRESS[20];
EXEC SQL END DECLARE SECTION;
We can note here that variables are written inside begin and end block of the SQL, but they are declared using C
code. It does not use SQL code to declare the variables. Why? This is because they are host variables – variables of
C language. Hence we cannot use SQL syntax to declare them. Host language supports almost all the datatypes from
int, char, long, float, double, pointer, array, string, structures etc.
When host variables are used in a SQL query, it should be preceded by colon – ‘:’ to indicate that it is a host
variable. Hence when pre-compiler compiles SQL code, it substitutes the value of host variable and compiles.
EXEC SQL SELECT * FROM STUDENT WHERE STUDENT_ID =:STD_ID;
The following code is a simple embedded SQL program, written in C. The program illustrates many, but
not all, of the embedded SQL techniques. The program prompts the user for an order number, retrieves the
customer number, salesperson, and status of the order, and displays the retrieved information on the screen.
int main() {
EXEC SQL INCLUDE SQLCA;
EXEC SQL BEGIN DECLARE SECTION;
int OrderID; /* Employee ID (from user) */
int CustID; /* Retrieved customer ID */
char SalesPerson[10] /* Retrieved salesperson name */
char Status[6] /* Retrieved order status */
EXEC SQL END DECLARE SECTION;
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 20
www.Poriyaan.in
query_error:
printf ("SQL error: %ld\n", sqlca->sqlcode);
exit();
bad_number:
printf ("Invalid order number.\n");
exit();
}
DYNAMIC SQL
The main disadvantage of embedded SQL is that it supports only static SQLs. If we need to build up queries at
run time, then we can use dynamic sql. That means if query changes according to user input, then it always better to
use dynamic SQL. Like we said above, the query when user enters student name alone and when user enters both
student name and address, is different. If we use embedded SQL, one cannot implement this requirement in the code.
In such case dynamic SQL helps the user to develop query depending on the values entered by him, without making
him know which query is being executed. It can also be used when we do not know which SQL statements like
Insert, Delete update or select needs to be used, when number of host variables is unknown, or when datatypes of
host variables are unknown or when there is direct reference to DB objects like tables, views, indexes are required.
However this will make user requirement simple and easy but it may make query lengthier and complex. That
means depending upon user inputs, the query may grow or shrink making the code flexible enough to handle all the
possibilities. In embedded SQL, compiler knows the query in advance and pre-compiler compiles the SQL code
much before C compiles the code for execution. Hence embedded SQLs will be faster in execution. But in the case
of dynamic SQL, queries are created, compiled and executed only at the run time. This makes the dynamic SQL
little complex, and time consuming.
Since query needs to be prepared at run time, in addition to the structures discussed in embedded SQL, we have
three more clauses in dynamic SQL. These are mainly used to build the query and execute them at run time.
PREPARE
Since dynamic SQL builds a query at run time, as a first step we need to capture all the inputs from the user. It will
be stored in a string variable. Depending on the inputs received from the user, string variable is appended with inputs
and SQL keywords. These SQL like string statements are then converted into SQL query. This is done by using
PREPARE statement.
For example, below is the small snippet from dynamic SQL. Here sql_stmt is a character variable, which holds
inputs from the users along with SQL commands. But is cannot be considered as SQL query as it is still a sting
value. It needs to be converted into a proper SQL query which is done at the last line using PREPARE statement.
Here sql_query is also a string variable, but it holds the string as a SQL query.
EXECUTE
This statement is used to compile and execute the SQL statements prepared in DB.
EXEC SQL EXECUTE sql_query;
EXECUTE IMMEDIATE
This statement is used to prepare SQL statement as well as execute the SQL statements in DB. It performs the task
of PREPARE and EXECUTE in a single line.
EXEC SQL EXECUTE IMMEDIATE :sql_stmt;
Dynamic SQL will not have any SELECT queries and host variables. But it can be any other SQL statements like
insert, delete, update, grant etc. But when we use insert/ delete/ updates in this type, we cannot use host variables.
All the input values will be hardcoded. Hence the SQL statements can be directly executed using EXECUTE
IMMEDIATE rather than using PREPARE and then EXECUTE.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 21
www.Poriyaan.in
EXEC SQL EXECUTE IMMEDIATE ‘GRANT SELECT ON STUDENT TO Faculty’;
EXEC SQL EXECUTE IMMEDIATE ‘DELETE FROM STUDENT WHERE STD_ID = 100’;
EXEC SQL EXECUTE IMMEDIATE ‘UPDATE STUDENT SET ADDRESS = ‘Troy’ WHERE STD_ID =100’;
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 22
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT 2019 - 2020
ER Model: Relationships
When an Entity is related to another Entity, they are said to have a relationship. For example, A ClassEntity is
related to Student entity, because students study in classes, hence this is a relationship.
Depending upon the number of entities involved, a degree is assigned to relationships.
For example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are involved, it is said to
be Ternary relationship, and so on.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 23
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
Weak Entity
A weak Entity is represented using double rectangular boxes. It is generally connected to another entity.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 24
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
ER Diagram: Entity
An Entity can be any object, place, person or class. In ER Diagram, an entity is represented using rectangles.
Consider an example of an Organisation- Employee, Manager, Department, Product and many more can be taken
as entities in an Organisation.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 25
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
The above example describes that one student can enroll only for one course and a course will also have only one
Student. This is not what you will usually see in real-world relationships.
One to Many Relationship
The below example showcases this relationship, which means that 1 student can opt for many courses, but a course
can only have 1 student. Sounds weird! This is how it is.
The above diagram represents that one student can enroll for more than one courses. And a course can have more
than 1 student enrolled in it.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 26
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
For example, in the diagram above, we have three related entities, Company, Product and Sector. To understand the
relationship better or to define rules around the model, we should relate two entities and then derive the third one.
A Company produces many Products/ each product is produced by exactly one company.
A Company operates in only one Sector / each sector has many companies operating in it.
Considering the above two rules or relationships, we see that although the complete relationship involves three
entities, but we are looking at two entities at a time.
The Enhanced ER Model
As the complexity of data increased in the late 1980s, it became more and more difficult to use the traditional ER
Model for database modelling. Hence some improvements or enhancements were made to the existing ER Model to
make it able to handle the complex applications better.
Hence, as part of the Enhanced ER Model, along with other improvements, three new concepts were added to the
existing ER Model, they were:
1. Generalization
2. Specialization
3. Aggregration
Generalization
Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity. In
generalization, the higher level entity can also combine with other lower level entities to make further higher level
entity.
It's more like Superclass and Subclass system, but the only difference is the approach, which is bottom-up. Hence,
entities are combined to form a more generalised entity, in other words, sub-classes are combined to form a super-
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 27
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
class.
For example, Saving and Current account types entities can be generalised and an entity with name Account can be
created, which covers both.
Specialization
Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be
broken down into two lower level entity. In specialization, a higher level entity may not have any lower-level entity
sets, it's possible.
Aggregation
Aggregation is a process when relation between two entities is treated as a single entity.
In the diagram above, the relationship between Center and Course together, is acting as an Entity, which is in
relationship with another entity Visitor. Now in real world, if a Visitor or a Student visits a Coaching Center, he/she
will never enquire about the center only or just about the course, rather he/she will ask enquire about both.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 28
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT 2019 - 2020
It is very convenient to design the database using the ER Model by creating an ER diagram and later on converting
it into relational model to design your tables.
Not all the ER Model constraints and components can be directly transformed into relational model, but an
approximate schema can be derived.
Few examples of ER diagrams and convert it into relational model schema, hence creating tables in RDBMS.
Entity becomes Table
Entity in ER Model is changed into tables, or we can say for every Entity in ER model, a table is created in
Relational Model.
And the attributes of the Entity gets converted to columns of the table.
And the primary key specified for the entity in the ER model, will become the primary key for the table in
relational model.
For example, for the below ER Diagram in ER Model,
A table with name Student will be created in relational model, which will have 4
columns, id, name, age, address and id will be the primary key for this table.
Table:Student
As discussd above, entity gets mapped to table, hence we will create table for Teacher and a table for Student with
all the attributes converted into columns.
Now, an additional table will be created for the relationship, for example StudentTeacher or give it any name you
like. This table will hold the primary key for both Student and Teacher, in a tuple to describe the relationship,
which teacher teaches which student.
If there are additional attributes related to this relationship, then they become the columns for this table, like subject
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 29
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
name.
Also proper foreign key constraints must be set for all the tables.
Functional Dependency
The functional dependency is a relationship that exists between two attributes. It typically exists between the
primary key and non-key attribute within a table.
X → Y
The left side of FD is known as a determinant, the right side of the production is known as a dependent.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee table because if we know the
Emp_Id, we can tell that employee name associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
Types of Functional dependency
Normalization of Database
Database Normalization is a technique of organizing the data in the database. Normalization is a systematic
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 30
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
approach of decomposing tables to eliminate data redundancy(repetition) and undesirable characteristics like
Insertion, Update and Deletion anomalies. It is a multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
Normalization is used for mainly two purposes,
In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod(Head
of Department) and office_tel is repeated for the students who are in the same branch in the college, this is Data
Redundancy.
Insertion Anomaly
Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or
else we will have to set the branch information as NULL.
Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all
those 100 students.
These scenarios are nothing but Insertion anomalies.
Updation Anomaly
What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the
student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency.
This is Updation anomaly.
Deletion Anomaly
In our Student table, two different informations are kept together, Student information and Branch information.
Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information.
This is Deletion anomaly.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 31
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
Normalization Rule
Normalization rules are divided into the following normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF
5. Fourth Normal Form
6. Fifth Normal Form
First Normal Form (1NF)
For a table to be in the First Normal Form, it should follow the following 4 rules:
1. It should only have single(atomic) valued attributes/columns.
2. Values stored in a column should be of the same domain
3. All the columns in a table should have unique names.
4. And the order in which data is stored, does not matter.
Rules for First Normal Form
The first normal form expects you to follow a few simple rules while designing your database, and they are:
Rule 1: Single Valued Attributes
Each column of your table should be single valued which means they should not contain multiple values. We
will explain this with help of an example later, let's see the other rules for now.
Rule 2: Attribute Domain should not change
This is more of a "Common Sense" rule. In each column the values stored must be of the same kind or type.
For example: If you have a column dob to save date of births of a set of people, then you cannot or you must
not save 'names' of some of them in that column along with 'date of birth' of others in that column. It should
hold only 'date of birth' for all the records/rows.
Rule 3: Unique name for Attributes/Columns
This rule expects that each column in a table should have a unique name. This is to avoid confusion at the time
of retrieving data or performing any other operation on the stored data.
If one or more columns have same name, then the DBMS system will be left confused.
Rule 4: Order doesn't matters
This rule says that the order in which you store the data in your table doesn't matter.
EXAMPLE
Create a table to store student data which will have student's roll no., their name and the name of subjects they
have opted for.
Here is the table, with some sample data added to it.
The table already satisfies 3 rules out of the 4 rules, as all our column names are unique, we have stored data
in the order we wanted to and we have not inter-mixed different type of data in columns.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 32
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
But out of the 3 different students in our table, 2 have opted for more than 1 subject. And we have stored the
subject names in a single column. But as per the 1st Normal form each column must contain atomic value.
It's very simple, because all we have to do is break the values into atomic values.
Here is our updated table and it now satisfies the First Normal Form.
roll_no name subject
101 Akon OS
101 Akon CN
103 Ckon Java
102 Bkon C
102 Bkon C++
By doing so, although a few values are getting repeated but values for the subject column are now atomic for
each record/row. Using the First Normal Form, data redundancy increases, as there will be many columns with
same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
For a table to be in the Second Normal Form,
1. It should be in the First Normal form.
2. And, it should not have Partial Dependency.
Dependency
Let's take an example of a Student table with columns student_id, name, reg_no(registration
number), branch and address(student's home address).
student_ reg_n branc addre
name
id o h ss
In this table, student_id is the primary key and will be unique for every row, hence we can use student_id to
fetch any row of data from this table
Even for a case, where student names are same, if we know the student_id we can easily fetch the correct
record.
student_id name reg_no branch address
10 Akon 07-WY CSE Kerala
11 Akon 08-WY IT Gujarat
Hence we can say a Primary Key for a table is the column or a group of columns(composite key) which can
uniquely identify each record in the table.
I can ask from branch name of student with student_id 10, and I can get it. Similarly, if I ask for name of
student with student_id 10 or 11, I will get it. So all I need is student_id and every other column depends on it,
or can be fetched using it.This is Dependency and we also call it Functional Dependency.
Partial Dependency
Now that we know what dependency is, we are in a better state to understand what partial dependency is.
For a simple table like Student, a single column like student_id can uniquely identfy all the records in a table.
But this is not true all the time. So now let's extend our example to see if more than 1 column together can act
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 33
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
as a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields and subject_id will
be the primary key.
subject_i subject_nam
d e
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table Subject for storing subject
information.
Let's create another table Score, to store the marks obtained by students in the respective subjects. We will also
be saving name of the teacher who teaches that subject along with marks.
subject_i
score_id student_id marks teacher
d
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are these and subject_id to
know for which subject the marks are for.
Together, student_id + subject_id forms a Candidate Key which can be the Primary key.
To get me marks of student with student_id 10, can you get it from this table? No, because you don't know for
which subject. And if I give you subject_id, you would not know for which student. Hence we need student_id
+ subject_id to uniquely identify any row.
But where is Partial Dependency?
Now if you look at the Score table, we have a column names teacher which is only dependent on the subject,
for Java it's Java Teacher and for C++ it's C++ Teacher & so on.
Now as we just discussed that the primary key for this table is a composition of two columns which
is student_id & subject_id but the teacher's name only depends on subject, hence the subject_id, and has
nothing to do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the primary key and not on
the whole key.
How to remove Partial Dependency?
There can be many different solutions for this, but out objective is to remove teacher's name from Score table.
The simplest solution is to remove columns teacher from Score table and add it to the Subject table. Hence, the
Subject table will become:
subject_id subject_name teacher
1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 34
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
And our Score table is now in the second normal form, with no partial dependency.
score_ student_ subject_ mar
id id id ks
1 10 1 70
2 10 2 75
3 11
Student Table
student_i
name reg_no branch address
d
10 Akon 07-WY CSE Kerala
11 Akon 08-WY IT Gujarat
12 Bkon 09-WY IT Rajasthan
Subject Table
subject_id subject_name teacher
1 Java Java Teacher
2 C++ C++ Teacher
3 Php Php Teacher
Score Table
In the Score table, we need to store some more information, which is the exam name and total marks, so let's
add 2 more columns to the Score table.
student_i subject_i
score_id marks
d d
1 10 1 70
2 10 2 75
3 11 1 80
Transitive Dependency
With exam_name and total_marks added to our Score table, it saves more data now. Primary key for the Score
table is a composite key, which means it's made up of two attributes or columns → student_id + subject_id.
The new column exam_name depends on both student and subject. For example, a mechanical engineering
student will have Workshop exam but a computer science student won't. And for some subjects you have
Practical exams and for some you don't. So we can say that exam_name is dependent on
both student_id and subject_id.
And what about our second new column total_marks? Does it depend on our Score table's primary key?
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 35
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
Well, the column total_marks depends on exam_name as with exam type the total score changes. For example,
practicals are of less marks while theory exams are of more marks.
But, exam_name is just another column in the score table. It is not a primary key or even a part of the primary
key, and total_marks depends on it.
This is Transitive Dependency. When a non-prime attribute depends on other non-prime attributes rather than
depending upon the prime attributes or primary key.
How to remove Transitive Dependency
Again the solution is very simple. Take out the columns exam_name and total_marks from Score table and put
them in an Exam table and use the exam_id wherever required.
Score Table: In 3rd Normal Form
student_i subject_
score_id marks exam_id
d id
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 36
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
Well, in the table above student_id, subject together form the primary key, because
using student_id and subject, we can find all the columns of the table.
One more important point to note here is, one professor teaches only one subject, but one subject may have
two different professors.
Hence, there is a dependency between subject and professor here, where subject depends on the professor
name.
This table satisfies the 1st Normal form because all the values are atomic, column names are unique and all the
values stored in a particular column are of same domain.
This table also satisfies the 2nd Normal Form as there is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal Form.
But this table is not in Boyce-Codd Normal Form.
Why this table is not in BCNF?
In the table above, student_id, subject form primary key, which means subject column is a prime attribute.
But, there is one more dependency, professor → subject.
And while subject is a prime attribute, professor is a non-prime attribute, which is not allowed by BCNF.
How to satisfy BCNF?
To make this relation(table) satisfy BCNF, we will decompose this table into two tables, student table
and professor table.
Below we have the structure for both the tables.
Student Table
student_id p_id
101 1
101 2
Professor Table
p_id professor subject
1 P.Java Java
2 P.Cpp C++
And now, this relation satisfy Boyce-Codd Normal Form.
Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
1. It is in the Boyce-Codd Normal Form.
2. And, it doesn't have Multi-Valued Dependency.
Multi-valued Dependency
A table is said to have multi-valued dependency, if the following conditions are true,
1. For a dependency A → B, if for a single value of A, multiple value of B exists, then the table may have
multi-valued dependency.
2. Also, a table should have at-least 3 columns for it to have a multi-valued dependency.
3. And, for a relation R(A,B,C), if there is a multi-valued dependency between, A and B, then B and C
should be independent of each other.
If all these conditions are true for any relation(table), it is said to have multi-valued dependency.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 37
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
Example
Below we have a college enrolment table with columns s_id, course and hobby.
cours
s_id hobby
e
Scienc
1 Cricket
e
1 Maths Hockey
2 C# Cricket
2 Php Hockey
From the table above, student with s_id 1 has opted for two courses, Science and Maths, and has two
hobbies, Cricket and Hockey.
You must be thinking what problem this can lead to, right?
Well the two records for student with s_id 1, will give rise to two more records, as shown below, because for
one student, two hobbies exists, hence along with both the courses, these hobbies should be specified.
s_id course hobby
1 Science Cricket
1 Maths Hockey
1 Science Hockey
1 Maths Cricket
And, in the table above, there is no relationship between the columns course and hobby. They are independent
of each other.
So there is multi-value dependency, which leads to un-necessary repetition of data and other anomalies as
well.
How to satisfy 4th Normal Form?
To make the above relation satify the 4th normal form, we can decompose the table into 2 tables.
CourseOpted Table
s_id course
1 Science
1 Maths
2 C#
2 Php
Hobbies Table,
s_id hobby
1 Cricket
1 Hockey
2 Cricket
2 Hockey
Now this relation satisfies the fourth normal form.
A table can also have functional dependency along with multi-valued dependency. In that case, the
functionally dependent columns are moved in a separate table and the multi-valued dependent columns are
moved to separate tables.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 38
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
The above table can be decomposed into the following three tables; therefore it is not in 5NF:
<EmployeeSkills>
EmpName EmpSkills
Tom Networking
Harry Web Development
Katie Programming
<EmployeeJob>
EmpName EmpJob
Tom EJ001
Harry EJ002
Katie EJ002
<JobSkills>
EmpSkills EmpJob
Networking EJ001
Web Development EJ002
Programming EJ002
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 39
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department o
The above relations have join dependency, so they are not in 5NF. That would mean that a join relation of the
above three relations is equal to our original relation <Employee>.
In above table, Rose takes both Mathematics and Physics class for Semester 1, but she does not take Physics
class for Semester 2. In this case, combination of all these 3 fields is required to identify a valid data. Imagine
we want to add a new class - Semester3 but do not know which Subject and who will be taking that subject.
We would be simply inserting a new entry with Class as Semester3 and leaving Lecturer and subject as NULL.
As we discussed above, it's not a good to have such entries. Moreover, all the three columns together act as a
primary key, we cannot leave other two columns blank!
Hence we have to decompose the table in such a way that it satisfies all the rules till 4NF and when join them
by using keys, it should yield correct record. Here, we can represent each lecturer's Subject area and their
classes in a better way. We can divide above table into three - (SUBJECT, LECTURER), (LECTURER,
CLASS), (SUBJECT, CLASS)
Now, each of combinations is in three different tables. If we need to identify who is teaching which subject to
which semester, we need join the keys of each table and get the result.
For example, who teaches Physics to Semester 1, we would be selecting Physics and Semester1 from table 3
above, join with table1 using Subject to filter out the lecturer names. Then join with table2 using Lecturer to
get correct lecturer name. That is we joined key columns of each table to get the correct data. Hence there is no
lose or new data - satisfying 5NF condition.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 40
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
1. TRANSACTION CONCEPTS
What is Transaction?
A set of logically related operations is known as transaction. The main operations of a transaction are:
Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it in a buffer
in main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from buffer.
Let us take a debit transaction from an account which consists of following operations:
1.R(A);
2.A=A-1000;
3.W(A);
Assume A’s value before starting of transaction is 5000.
The first operation reads the value of A from database and stores it in a buffer.
Second operation will decrease its value by 1000. So buffer will contain 4000.
Third operation will write the value from buffer to database. So A’s final value will be 4000.
But it may also be possible that transaction may fail after executing some of its operations. The failure can be
because of hardware, software or power etc. For example, if debit transaction discussed above fails after
executing operation 2, the value of A will remain 5000 in the database which is not acceptable by the bank.
States of Transactions
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 41
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Active State A transaction enters into an active state when the execution process begins. During this
state read or write operations can be performed.
Partially A transaction goes into the partially committed state after the end of a transaction.
Committed
Committed When the transaction is committed to state, it has already completed its execution
State successfully. Moreover, all of its changes are recorded to the database permanently.
Failed State A transaction considers failed when any one of the checks fails or if the transaction is
aborted while it is in the active state.
Terminated State of transaction reaches terminated state when certain transactions which are
State leaving the system can't be restarted.
2. ACID PROPERTIES
A transaction is a single logical unit of work which accesses and possibly modifies the contents of a
database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after transaction, certain properties are
followed. These are called ACID properties.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 42
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Atomicity
By this, we mean that either the entire transaction takes place at once or doesn’t happen at all. There is
no midway i.e. transactions do not occur partially. Each transaction is considered as one unit and either
runs to completion or is not executed at all. It involves following two operations.
—Abort: If a transaction aborts, changes made to database are not visible.
—Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the ‘All or nothing rule’.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to
account Y.
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X) but
before write(Y)), then amount has been deducted from X but not added to Y. This results in an inconsistent
database state. Therefore, the transaction must be executed in entirety in order to ensure correctness of
database state.
Consistency
This means that integrity constraints must be maintained so that the database is consistent before and
after the transaction. It refers to correctness of a database.
Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a result T is
incomplete.
Isolation
This property ensures that multiple transactions can occur concurrently without leading to
inconsistency of database state. Transactions occur independently without interference. Changes occurring
in a particular transaction will not be visible to any other transaction until that particular change in that
transaction is written to memory or has been committed. This property ensures that the execution of
transactions concurrently will result in a state that is equivalent to a state achieved these were executed
serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T”.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 43
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Suppose T has been executed till Read (Y) and then T’’ starts. As a result , interleaving of operations
takes place due to which T’’ reads correct value of X but incorrect value of Y and sum computed by
T’’: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take place in
isolation and changes should be visible only after a they have been made to the main memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and modifications
to the database are stored in and written to disk and they persist even if system failure occurs. These updates
now become permanent and are stored in a non-volatile memory. The effects of the transaction, thus, are
never lost.
3. SCHEDULES
1. Serial Schedules
Schedules in which the transactions are executed non-interleaved, i.e., a serial schedule is one in which
no transaction starts until a running transaction has ended are called serial schedules.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
R(B)
W(B)
R(A)
R(B)
where R(A) denotes that a read operation is performed on some data item ‘A’
This is a serial schedule since the transactions perform serially in the order T1 —> T2
2. Complete Schedules
Schedules in which the last operation of each transaction is either abort (or) commit are called
complete schedules.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 44
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Example: Consider the following schedule involving three transactions T1, T2 and T3.
T1 T2 T3
R(A)
W(A)
R(B)
W(B)
commit
commit
abort
This is a complete schedule since the last operation performed under every transaction is either
“commit” or “abort”.
3. Recoverable Schedules
Schedules in which transactions commit only after all transactions whose changes they read commit
are called recoverable schedules. In other words, if some transaction Tj is reading value updated or written
by some other transaction Ti, then the commit of Tj must occur after the commit of Ti.
Example – Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
W(A)
W(A)
R(A)
commit
commit
This is a recoverable schedule since T1 commits before T2, that makes the value read by T2 correct.
4. Cascadeless Schedules –
Also called Avoids cascading aborts/rollbacks (ACA). Schedules in which transactions read values
only after all transactions whose changes they are going to read commit are called cascadeless schedules.
Avoids that a single transaction abort leads to a series of transaction rollbacks. A strategy to prevent
cascading aborts is to disallow a transaction from reading uncommitted changes from another transaction
in the same schedule.
In other words, if some transaction Tj wants to read value updated or written by some other transaction Ti,
then the commit of Tj must read it after the commit of Ti.
Example: Consider the following schedule involving two transactions T1 and T2.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 45
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
T1 T2
R(A)
W(A)
W(A)
commit
R(A)
commit
This schedule is cascadeless. Since the updated value of A is read by T2 only after the updating
transaction i.e. T1 commits.
5. Strict Schedules
A schedule is strict if for any two transactions Ti, Tj, if a write operation of Ti precedes a conflicting
operation of Tj (either read or write), then the commit or abort event of Ti also precedes that conflicting
operation of Tj.
In other words, Tj can read or write updated or written value of Ti only after Ti commits/aborts.
Example: Consider the following schedule involving two transactions T1 and T2.
T1 T2
R(A)
R(A)
W(A)
commit
W(A)
R(A)
commit
This is a strict schedule since T2 reads and writes A which is written by T1 only after the commit of T1.
Note – It can be seen that:
1. Cascadeless schedules are stricter than recoverable schedules or are a subset of recoverable schedules.
2. Strict schedules are stricter than cascadeless schedules or are a subset of cascadeless schedules.
3. Serial schedules satisfy constraints of all recoverable, cascadeless and strict schedules and hence is a
subset of strict schedules.
4. SERIALIZABILITY
When multiple transactions are running concurrently then there is a possibility that the database may be
left in an inconsistent state. Serializability is a concept that helps us to check which schedules are
serializable. A serializable schedule is the one that always leaves the database in consistent state.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 46
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
R(A)
R(B)
7 Prepared By: Mrs. E. Ajitha (AP/CSE)
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 47
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
R(A)
R(B)
W(B)
W(A)
To convert this schedule into a serial schedule we must have to swap the R(A) operation of transaction
T2 with the W(A) operation of transaction T1. However, we cannot swap these two operations because they
are conflicting operations, thus we can say that this given schedule is not Conflict Serializable.
Let’s take another example:
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
Let’s swap non-conflicting operations:
After swapping R(A) of T1 and R(A) of T2 we get:
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
After swapping R(A) of T1 and R(B) of T2 we get:
T1 T2
----- ------
R(A)
R(B)
R(A)
W(B)
R(B)
W(A)
After swapping R(A) of T1 and W(B) of T2 we get:
T1 T2
----- ------
R(A)
R(B)
W(B)
R(A)
R(B)
W(A)
We finally got a serial schedule after swapping all the non-conflicting operations so we can say that the
given schedule is Conflict Serializable.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 48
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
2. View Serializability
View Serializability is a process to find out that a given schedule is view serializable or not.
To check whether a given schedule is view serializable, we need to check whether the given schedule
is View Equivalent to its serial schedule. Lets take an example to understand what I mean by that.
Given Schedule:
T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)
Serial Schedule of the above given schedule:
As we know that in Serial schedule a transaction only starts when the current running transaction is
finished. So the serial schedule of the above given schedule would look like this:
T1 T2
----- ------
R(X)
W(X)
R(Y)
W(Y)
R(X)
W(X)
R(Y)
W(Y)
If we can prove that the given schedule is View Equivalent to its serial schedule then the given schedule is
called view Serializable.
Testing for Serializability
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 49
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
5. CONCURRENCY CONTROL
In the concurrency control, the multiple transactions can be executed simultaneously.
It may affect the transaction result. It is highly important to maintain the order of execution of those
transactions.
6. NEED FOR CONCURRENCY
Problems of concurrency control
Several problems can occur when concurrent transactions are executed in an uncontrolled manner.
Following are the three problems in concurrency control.
10 Prepared By: Mrs. E. Ajitha (AP/CSE)
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 50
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Lost updates
Dirty read
Unrepeatable read
1. Lost update problem
When two transactions that access the same database items contain their operations in a way that
makes the value of some database item incorrect, then the lost update problem occurs.
If two transactions T1 and T2 read a record and then update it, then the effect of updating of the first
record will be overwritten by the second update.
Example:
Here,
o At time t2, transaction-X reads A's value.
o At time t3, Transaction-Y reads A's value.
o At time t4, Transactions-X writes A's value on the basis of the value seen at time t2.
o At time t5, Transactions-Y writes A's value on the basis of the value seen at time t3.
o So at time T5, the update of Transaction-X is lost because Transaction y overwrites it without
looking at its current value.
o Such type of problem is known as Lost Update Problem as update made by one transaction is
lost here.
2. Dirty Read
o The dirty read occurs in the case when one transaction updates an item of the database, and
then the transaction fails for some reason. The updated database item is accessed by another
transaction before it is changed back to the original value.
o A transaction T1 updates a record which is read by T2. If T1 aborts then T2 now has values
which have never formed part of the stable database.
Example:
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 51
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
o At time t4, Transactions-Y rollbacks. So, it changes A's value back to that of prior to t1.
o So, Transaction-X now contains a value which has never become part of the stable database.
o Such type of problem is known as Dirty Read Problem, as one transaction reads a dirty value
which has not been committed.
3. Inconsistent Retrievals Problem
o Inconsistent Retrievals Problem is also known as unrepeatable read. When a transaction
calculates some summary function over a set of data while the other transactions are updating the data,
then the Inconsistent Retrievals Problem occurs.
o A transaction T1 reads a record and then does some other processing during which the
transaction T2 updates the record. Now when the transaction T1 reads the record, then the new value
will be inconsistent with the previous value.
Example:
Suppose two transactions operate on three accounts.
o Transaction-X is doing the sum of all balance while transaction-Y is transferring an amount 50
from Account-1 to Account-3.
o Here, transaction-X produces the result of 550 which is incorrect. If we write this produced
result in the database, the database will become an inconsistent state because the actual sum is 600.
o Here, transaction-X has seen an inconsistent state of the database.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 52
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
7. LOCKING PROTOCOLS
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an appropriate lock on it.
There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only read by the
transaction.
o It can be shared between the transactions because when the transaction holds a lock, then it
can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the same data
simultaneously.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 53
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
The following way shows how unlocking and locking work with 2-PL.
Transaction T1:
o Growing phase: from step 1-3
o Shrinking phase: from step 5-7
o Lock point: at 3
Transaction T2:
o Growing phase: from step 2-6
o Shrinking phase: from step 8-9
o Lock point: at 6
Types of Two Phase Locking (2PL)
1. Strict Two-phase locking (Strict-2PL)
o The first phase of Strict-2PL is similar to 2PL. In the first phase, after acquiring all the locks,
the transaction continues to execute normally.
o The only difference between 2PL and strict 2PL is that Strict-2PL does not release a lock after
using it.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 54
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
o Strict-2PL waits until the whole transaction to commit, and then it releases all the locks at a
time.
o Strict-2PL protocol does not have shrinking phase of lock release.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 55
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Thomas write Rule
Thomas Write Rule provides the guarantee of serializability order for the protocol. It improves the
Basic Timestamp Ordering Algorithm.
The basic Thomas write rules are as follows:
o If TS(T) < R_TS(X) then transaction T is aborted and rolled back, and operation is rejected.
o If TS(T) < W_TS(X) then don't execute the W_item(X) operation of the transaction and
continue processing.
o If neither condition 1 nor condition 2 occurs, then allowed to execute the WRITE operation by
transaction Ti and set W_TS(X) to TS(T).
Problems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 56
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
9. DEADLOCK
A deadlock is a condition wherein two or more tasks are waiting for each other in order to be finished
but none of the task is willing to give up the resources that other task needs. In this situation no task ever gets
finished and is in waiting state forever.
Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the following
conditions holds true.
Mutual exclusion condition: There must be at least one resource that cannot be used by more than one
process at a time.
Hold and wait condition: A process that is holding a resource can request for additional resources that
are being held by other processes in the system.
No preemption condition: A resource cannot be forcibly taken from a process. Only the process can
release a resource that is being held by it.
Circular wait condition: A condition where one process is waiting for a resource that is being held by
second process and second process is waiting for third process ….so on and the last process is waiting
for the first process. Thus, making a circular chain of waiting.
For example: In the student table, transaction T1 holds a lock on some rows and needs to update some
rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the grade table and needs
to update the rows in the Student table held by Transaction T1.
Now, the main problem arises. Now Transaction T1 is waiting for T2 to release its lock and similarly,
transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and remain at a standstill.
It will remain in a standstill until the DBMS detects the deadlock and aborts one of the transactions.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 57
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Deadlock Avoidance
o When a database is stuck in a deadlock state, then it is better to avoid the database rather than aborting
The wait for a graph for the above scenario is shown below:
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 58
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
Deadlock Prevention
o Deadlock prevention method is suitable for a large database. If the resources are allocated in such a way
I. Wait-Die scheme
In this scheme, if a transaction requests for a resource which is already held with a conflicting lock by
another transaction then the DBMS simply checks the timestamp of both transactions. It allows the older
transaction to wait until the resource is available for execution.
Let's assume there are two transactions Ti and Tj and let TS(T) is a timestamp of any transaction T. If
T2 holds a lock by some other transaction and T1 is requesting for resources held by T2 then the following
actions are performed by DBMS:
1. Check if TS(Ti) < TS(Tj) - If Ti is the older transaction and Tj has held some resource, then Ti is
allowed to wait until the data-item is available for execution. That means if the older transaction is
waiting for a resource which is locked by the younger transaction, then the older transaction is allowed
to wait for resource until it is available.
2. Check if TS(Ti) < TS(Tj) - If Ti is older transaction and has held some resource and if Tj is waiting
for it, then Tj is killed and restarted later with the random delay but with the same timestamp.
II. Wound wait scheme
o In wound wait scheme, if the older transaction requests for a resource which is held by the younger
transaction, then older transaction forces younger one to kill the transaction and release the resource.
After the minute delay, the younger transaction is restarted but with the same timestamp.
o If the older transaction has held a resource which is requested by the Younger transaction, then the
younger transaction is asked to wait until older releases it.
Here is the table representation of resource allocation for each algorithm. Both of these
algorithms take process age into consideration while determining the best possible way of resource
allocation for deadlock avoidance. One of the famous deadlock avoidance algorithm is Banker’s
algorithm
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 59
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 60
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 61
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 62
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 63
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 64
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 65
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 66
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 67
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 68
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 69
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 70
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 71
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492-Database Management Systems
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 72
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
There is no parity checking of data. So if data in one drive gets corrupted then all the data would be lost. Thus RAID
0 does not support data recovery Spanning is another term that is used with RAID level 0 because the logical disk
will span all the physical drives. RAID 0 implementation requires minimum 2 disks.
Advantages
I/O performance is greatly improved by spreading the I/O load across many channels & drives.
Best performance is achieved when data is striped across multiple controllers with only one driver per
controller
Disadvantages
It is not fault-tolerant, failure of one drive will result in all data in an array being lost
RAID Level 1: Mirroring (or shadowing)
Also known as disk mirroring, this configuration consists of at least two drives that duplicate the storage of
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 73
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
RAID Level 2:
This configuration uses striping across disks, with some disks storing error checking and correcting (ECC)
information. It has no advantage over RAID 3 and is no longer used.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 74
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
RAID Level 5:
RAID 5 uses striping as well as parity for redundancy. It is well suited for heavy read and low write
operations.
Block-Interleaved Distributed Parity; partitions data and parity among all N + 1 disks, rather than storing
data in N disks and parity in 1 disk.
RAID Level 6:
This technique is similar to RAID 5, but includes a second parity scheme that is distributed across the drives
in the array. The use of additional parity allows the array to continue to function even if two disks fail
simultaneously. However, this extra protection comes at a cost.
P+Q Redundancy scheme; similar to Level 5, but stores extra redundant information to guard against
multiple disk failures.
- Better reliability than Level 5 at a higher cost; not used as widely.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 75
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
File Organization
The database is stored as a collection of files.
Each file is a sequence of records.
A record is a sequence of fields.
Classifications of records
– Fixed length record
– Variable length record
Fixed length record approach:
Assume record size is fixed each file has records of one particular type only different files are used
for different relations
Simple approach
- Record access is simple
Example pseudo code
type account = record
account_number char(10);
branch_name char(22);
balance numeric(8);
end
Total bytes 40 for a record
Two problems
- Difficult to delete record from this structure.
- Some record will cross block boundaries, that is part of the record will be stored in one block and
part in another. It would require two block accesses to read or write
Reuse the free space alternatives:
– move records i + 1, . . ., n to n i, . . . , n – 1
– do not move records, but link all free records on a
free list
– Move the final record to deleted record place.
Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted record, and so on
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 76
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
Variable-Length Records
Byte string representation
Attach an end-of-record () control character to the end of each record
Difficulty with deletion
0 perryridge A-102 400 A-201 900
Disadvantage
It is not easy to reuse space occupied formerly by deleted record.
There is no space in general for records grows longer
Slotted Page Structure
Pointer Method
A variable-length record is represented by a list of fixed-length records, chained together via pointers.
Can be used even if the maximum record length is not known.
Disadvantage to pointer structure; space is wasted in all records except the first in a a chain.Solution
is to allow two kinds of block in file:
Anchor block – contains the first records of chain
Overflow block – contains records other than those that are the first records of chains.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 77
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
Search-key pointer
Index files are typically much smaller than the original file
Two basic kinds of indices:
– Ordered indices: search keys are stored in sorted order
– Hash indices: search keys are distributed uniformly across “buckets” and by using a “hash
function” the values are determined.
Ordered Indices
In an ordered index, index entries are stored sorted on the search key value.
Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of
the file.
Secondary index: an index whose search key specifies an order different from the sequential order of the
file.
Types of Ordered Indices
Dense index
Sparse index
Dense Index Files
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 78
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
Dense index — Index record appears for every search-key value in the file.
Multilevel Index
If primary index does not fit in memory, access becomes expensive.
To reduce number of disk accesses to index records, treat primary index kept on disk as a sequential file and
construct a sparse index on it.
– outer index – a sparse index of primary index
– inner index – the primary index file
If even outer index is too large to fit in main memory, yet another level of index can be created, and so on.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 79
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 80
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
Disadvantage of indexed-sequential files: performance degrades as file grows, since many overflow blocks
get created. Periodic reorganization of entire file is required.
Advantage of B+-tree index files: automatically reorganizes itself with small, local, changes, in the face of
insertions and deletions. Reorganization of entire file is not required to maintain performance.
Disadvantage of B+-trees: extra insertion and deletion overhead, space overhead.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 81
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
Non-leaf nodes other than root must have between 3 and 5 children ((n/2 and n with n =5).
Root must have at least 2 children.
Observations about B+-trees
Since the inter-node connections are done by pointers, “logically” close blocks need not be “physically”
close.
The B+-tree contains a relatively small number of levels thus searches can be conducted efficiently.
Insertions and deletions to the main file can be handled efficiently.
Updates on B+-Trees: Insertion
Find the leaf node in which the search-key value would appear
If the search-key value is already there in the leaf node, record is added to file and if necessary a pointer is
inserted into the bucket.
If the search-key value is not there, then add the record to the main file and create a bucket if
necessary.Then:
– If there is room in the leaf node, insert (key-value, pointer) pair in the leaf node otherwise, split the
node.
Example: B+-Tree before and after insertion of “Clearview”
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 82
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
• The removal of the leaf node containing “Downtown” did not result in its parent having too little pointers.
So the cascaded deletions stopped with the deleted leaf node’s parent.
Deletion of “Perryridge” from result of previous example
• Node with “Perryridge” becomes empty and merged with its sibling.
• Root node then had only one child, and was deleted and its child became the new root node
+
B -Tree File Organization
• The leaf nodes in a B+-tree file organization store records, instead of pointers.
• Since records are larger than pointers, the maximum number of records that can be stored in a leaf node is
less than the number of pointers in a nonleaf node.
• Leaf nodes are still required to be half full.
• Insertion and deletion are handled in the same way as insertion and deletion of entries in a B+-tree index.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 83
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
HASHING
• Hashing is an effective technique to calculate the direct location of a data record on the disk without using
index structure.
• Hashing uses hash functions with search keys as parameters to generate the address of a data record.
Hash Organization
Bucket
A hash file stores data in bucket format. Bucket is considered a unit of storage. A bucket typically
stores one complete disk block, which in turn can store one or more records.
Hash Function
A hash function, h, is a mapping function that maps all the set of search-keys K to the address
where actual records are placed. It is a function from search keys to bucket addresses.
Worst hash function maps all search-key values to the same bucket.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 84
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
An ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key values from
the set of all possible values.
Ideal hash function is random, so each bucket will have the same number of records.
Types
• Static Hashing
• Dynamic Hashing
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes the same address.
For example, if mod-4 hash function is used, then it shall generate only 5 values. The output address shall
always be same for that function.
The number of buckets provided remains unchanged at all times.
Example of Hash File Organization
There are 10 buckets,
The hash function returns the sum of the binary representations of the characters modulo 10
– E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3
Operation
Insertion − When a record is required to be entered using static hash, the hash function h computes the
bucket address for search key K, where the record will be stored.
Bucket address = h(K)
Search − When a record needs to be retrieved, the same hash function can be used to retrieve the address of
the bucket where the data is stored.
Delete − This is simply a search followed by a deletion operation.
Handling of Bucket Overflows
Bucket overflow can occur because of
– Insufficient buckets
– Skew in distribution of records. This can occur due to :
• multiple records have same search-key value
Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is handled by using
overflow buckets.
Overflow chaining – the overflow buckets of a given bucket are chained together in a linked list.
Above scheme is called closed hashing.
– An alternative, called open hashing, which does not use overflow buckets, is not suitable for
database applications.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 85
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
Hash Indices
• Hashing can be used not only for file organization, but also for index-structure creation.
• A hash index organizes the search keys, with their associated record pointers, into a hash file structure.
• Hash indices are always secondary indices
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 86
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
– The number of buckets also changes dynamically due to coalescing and splitting of buckets.
General Extendable Hash
In this structure, i2 = i3 = i, whereas i1 = i – 1
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 87
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 88
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
appropriate bucket
Updates in Extendable Hash Structure
To insert a record with search-key value Kj
– follow same procedure as look-up and locate the bucket, say j.
– If there is room in the bucket j insert record in the bucket.
– Overflow buckets used instead in some cases.
To delete a key value,
– locate it in its bucket and remove it.
– The bucket itself can be removed if it becomes empty
– Coalescing of buckets can be done
– Decreasing bucket address table size is also possible
Benefits of extendable hashing:
– Hash performance does not degrade with growth of file
– Minimal space overhead
Disadvantages of extendable hashing
– Extra level of indirection to find desired record
Bucket address table may itself become very big.
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 89
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 90
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
– (OP3):σDNO=5 (EMPLOYEE)
– (OP4):σ DNO=5 AND SALARY>30000 AND SEX = ‘F’ (EMPLOYEE)
– (OP5):σESSN=‘123456789’ AND PNO=10 (WORKS_ON)
Many search methods can be used for simple selection: S1 through S6
S1: Linear Search (brute force) –full scan in Oracle’s terminology-
– Retrieves every record in the file, and test whether its attribute values satisfy the selection condition:
an expensive approach.
– Cost: b/2 if key and b if no key
S2: Binary Search
– If the selection condition involves an equality comparison on a key attribute on which the file is
ordered.
– σSSN=‘1234567’ (EMPLOYEE), SSN is the ordering attribute.
– Cost: log2b if key.
S3: Using a Primary Index (hash key)
– An equality comparison on a key attribute with a primary index (or hash key).
– This condition retrieves a single record (at most).
– Cost :primary index : bind/2 + 1(hash key: 1bucket if no collision).
S4: Using a primary index to retrieve multiple records
– Comparison condition is >, >=, <, or <= on a key field with a primary index
– σDNUMBER >5(DEPARTMENT)
– Use the index to find the record satisfying the corresponding equality condition (DNUMBER=5), then
retrieve all subsequent records in the (ordered) file.
– For the condition (DNUMBER <5), retrieve all the preceding records.
– Method used for range queries too(i.e. queries to retrieve records in certain range)
– Cost: bind/2 + ?. ‘?’ could be known if the number of duplicates known.
S5: Using a clustering index to retrieve multiple records
– If the selection condition involves an equality comparison on a non-key attribute
with a clustering index.
– σDNO=5(EMPLOYEE)
– Use the index to retrieve all the records satisfying the condition.
– Cost: log2bind + ?. ‘?’ could be known if the number of duplicates known.
S6: Using a secondary (B+-tree) index on an equality comparison
– The method can be used to retrieve a single record if the indexing field is a key or to retrieve multiple
records if the indexing field is not a key.
– This can also be used for comparisons involving >, >=, <, or <=.
– Method used for range queries too.
– Cost to retrieve: a key= height + 1; a non key= height+1(extra-level)+?, comparison=(height-1)+?+?
Many search methods can be used for complex selection which involve a Conjunctive Condition: S7 through
as S9.
– Conjunctive condition: several simple conditions connected with the AND logical connective.
– (OP4): s DNO=5 AND SALARY>30000 AND SEX = ‘F’ (EMPLOYEE).
S7:Conjunctive selection using an individual index.
– If an attribute involved in any single simple condition in the conjunctive condition has an access path
that permits the use of one of the Methods S2 to S6, use that condition to retrieve the records.
– Then check whether each retrieved record satisfies the remaining simple conditions in the conjunctive
condition
S8:Conjunctive selection using a composite index:
– If two or more attributes are involved in equality conditions in the conjunctive condition and a
composite index (or hash structure) exists on the combined fields.
– Example: If an index has been created on the composite key (ESSN, PNO) of the WORKS_ON file,
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 91
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 92
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
3. Using commutativity and associativity of binary operations, rearrange the leaf nodes of the tree
4. Combine a CARTESIAN PRODUCT operation with a subsequent SELECT operation in the tree into a
JOIN operation, if the condition represents a join condition
5. Using the cascading of PROJECT and the commuting of PROJECT with other operations, break down and
move lists of projection attributes down the tree as far as possible by creating new PROJECT operations as
needed
6. Identify sub-trees that represent groups of operations that can be executed by a single algorithm
Example
Query
"Find the last names of employees born after 1957 who work on a project named ‘Aquarius’."
SQL
SELECT LNAME
FROM EMPLOYEE, WORKS_ON, PROJECT
WHERE PNAME=‘Aquarius’ AND PNUMBER=PNO AND ESSN=SSN AND BDATE.‘1957-12-31’;
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 93
www.Notesfree.in
www.Notesfree.in
www.Poriyaan.in
CS3492 /Database Management Systems Department of CSE & IT BY
https://play.google.com/store/apps/details?id=com.poriyaan.poriyaan 94
www.Notesfree.in
Database Management System
Physics
Basic for Engineering
Electrical and Data Structure
Problem Solving and Science Engineering
Electronics
Python Programming Object Oriented
Programming in C
Programming
Elective-Management
Professional Elective II
Professional Elective IV