0% found this document useful (0 votes)

63 views29 pages

Storing Data: Disks and Files

The document discusses how database management systems store data on disks and manage disk space and memory. DBMSs store information on hard disks in units called blocks or pages due to the large cost of disk access compared to memory. Main memory is used to store currently used data pages, while disks store the entire database and tapes can be used for archiving. The document outlines different disk, memory, file, and record organization techniques used in DBMSs to optimize data access and manage space.

Uploaded by

Paksmiler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views29 pages

Storing Data: Disks and Files

Uploaded by

Paksmiler

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 29

Storing Data: Disks and Files

Chapter 9
Yea, from the table of my memory Ill wipe away all trivial fond records. -- Shakespeare, Hamlet

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Disks and Files

DBMS stores information on (hard) disks. This has major implications for DBMS design!

READ: transfer data from disk to main memory (RAM). WRITE: transfer data from RAM to disk. Both are high-cost operations, relative to in-memory operations, so must be planned carefully!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Why Not Store Everything in Main Memory?

Costs too much. $1000 will buy you either 128MB of RAM or 7.5GB of disk today. Main memory is volatile. We want data to be saved between runs. (Obviously!) Typical storage hierarchy:

Main memory (RAM) for currently used data. Disk for the main database (secondary storage). Tapes for archiving older versions of the data (tertiary storage).
3

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Disks
Secondary storage device of choice. Main advantage over tapes: random access vs. sequential. Data is stored and retrieved in units called disk blocks or pages. Unlike RAM, time to retrieve a disk page varies depending upon location on disk.

Therefore, relative placement of pages on disk has major impact on DBMS performance!
4

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Components of a Disk
Disk head

Spindle Tracks

The platters spin (say, 90rps).

Sector

The arm assembly is moved in or out to position a head on a desired track. Tracks under heads make a cylinder (imaginary!). Only one head reads/writes at any one time.

Arm movement

Platters

Block size is a multiple of sector size (which is fixed).

Arm assembly

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Accessing a Disk Page

Time to access (read/write) a disk block:

seek time (moving arms to position disk head on track) rotational delay (waiting for block to rotate under head) transfer time (actually moving data to/from disk surface)
Seek time varies from about 1 to 20msec Rotational delay varies from 0 to 10msec Transfer rate is about 1msec per 4KB page

Seek time and rotational delay dominate.

Key to lower I/O cost: reduce seek/rotation delays! Hardware vs. software solutions?
6

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Arranging Pages on Disk

`Next block concept:

blocks on same track, followed by blocks on same cylinder, followed by blocks on adjacent cylinder

Blocks in a file should be arranged sequentially on disk (by `next), to minimize seek and rotational delay. For a sequential scan, pre-fetching several pages at a time is a big win!

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 7

RAID
Disk Array: Arrangement of several disks that gives abstraction of a single, large disk. Goals: Increase performance and reliability. Two main techniques:

Data striping: Data is partitioned; size of a partition is called the striping unit. Partitions are distributed over several disks. Redundancy: More disks => more failures. Redundant information allows reconstruction of data if a disk fails.
8

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

RAID Levels
Level 0: No redundancy Level 1: Mirrored (two identical copies)

Each disk has a mirror image (check disk) Parallel reads, a write involves two disks. Maximum transfer rate = transfer rate of one disk Parallel reads, a write involves two disks. Maximum transfer rate = aggregate bandwidth

Level 0+1: Striping and Mirroring

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

RAID Levels (Contd.)

Level 3: Bit-Interleaved Parity

Striping Unit: One bit. One check disk. Each read and write request involves all disks; disk array can process one request at a time. Striping Unit: One disk block. One check disk. Parallel reads possible for small requests, large requests can utilize full bandwidth Writes involve modified block and check disk Similar to RAID Level 4, but parity blocks are distributed over all disks
10

Level 4: Block-Interleaved Parity

Level 5: Block-Interleaved Distributed Parity

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Disk Space Management

Lowest layer of DBMS software manages space on disk. Higher levels call upon this layer to:

allocate/de-allocate a page read/write a page

Request for a sequence of pages must be satisfied by allocating the pages sequentially on disk! Higher levels dont need to know how this is done, or how free space is managed.
11

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Buffer Management in a DBMS

Page Requests from Higher Levels
BUFFER POOL

disk page
free frame

MAIN MEMORY
DISK

choice of frame dictated by replacement policy

Data must be in RAM for DBMS to operate on it! Table of <frame#, pageid> pairs is maintained.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 12

When a Page is Requested ...

If requested page is not in pool:

Choose a frame for replacement If frame is dirty, write it to disk Read requested page into chosen frame

Pin the page and return its address.

If requests can be predicted (e.g., sequential scans) pages can be pre-fetched several pages at a time!
13

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

More on Buffer Management

Requestor of page must unpin it, and indicate whether page has been modified:

dirty bit is used for this.

Page in pool may be requested many times,

a pin count is used. A page is a candidate for replacement iff pin count = 0.

CC & recovery may entail additional I/O when a frame is chosen for replacement. (Write-Ahead Log protocol; more later.)
14

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Buffer Replacement Policy

Frame is chosen for replacement by a replacement policy:

Least-recently-used (LRU), Clock, MRU etc.

Policy can have big impact on # of I/Os; depends on the access pattern. Sequential flooding: Nasty situation caused by LRU + repeated sequential scans.

# buffer frames < # pages in file means each page request causes an I/O. MRU much better in this situation (but not in all situations, of course).
15

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

DBMS vs. OS File System

OS does disk space & buffer mgmt: why not let OS manage these tasks? Differences in OS support: portability issues Some limitations, e.g., files cant span disks. Buffer management in DBMS requires ability to:

pin a page in buffer pool, force a page to disk (important for implementing CC & recovery), adjust replacement policy, and pre-fetch pages based on access patterns in typical DB operations.
16

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Record Formats: Fixed Length

F1
L1

F2
L2

F3
L3

F4
L4

Base address (B)

Address = B+L1+L2

Information about field types same for all records in a file; stored in system catalogs. Finding ith field does not require scan of record.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 17

Record Formats: Variable Length

Two alternative formats (# fields is fixed):

F1 F2 F3 F4

4
Field Count

Fields Delimited by Special Symbols

F1 F2 F3 F4

Array of Field Offsets * Second offers direct access to ith field, efficient storage of nulls (special dont know value); small directory overhead.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 18

Page Formats: Fixed Length Records

Slot 1 Slot 2 Slot 1 Slot 2 Free Space Slot N Slot M N PACKED number of records 1 . . . 0 1 1M M ... 3 2 1 UNPACKED, BITMAP number of slots

...
Slot N

...

Record id = <page id, slot #>. In first alternative, moving records for free space management changes rid; may not be acceptable.
19

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Page Formats: Variable Length Records

Rid = (i,N) Rid = (i,2) Rid = (i,1) Page i

20 N

...

16 2

24 N 1 # slots

SLOT DIRECTORY

Pointer to start of free space

Can move records on page without changing rid; so, attractive for fixed-length records too.
20

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Files of Records

Page or block is OK when doing I/O, but higher levels of DBMS operate on records, and files of records. FILE: A collection of pages, each containing a collection of records. Must support:

insert/delete/modify record read a particular record (specified using record id) scan all records (possibly with some conditions on the records to be retrieved)
21

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Unordered (Heap) Files

Simplest file structure contains records in no particular order. As file grows and shrinks, disk pages are allocated and de-allocated. To support record level operations, we must:

keep track of the pages in a file keep track of free space on pages keep track of the records on a page

There are many alternatives for keeping track of this.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Heap File Implemented as a List

Data Page
Header Page

Data Page

Full Pages

Data Page

Pages with Free Space

The header page id and Heap file name must be stored someplace. Each page contains 2 `pointers plus data.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 23

Heap File Using a Page Directory

Header Page Data Page 1

Data Page 2

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Much smaller than linked list of all HF pages!

System Catalogs

For each index:

structure (e.g., B+ tree) and search key fields name, file name, file structure (e.g., Heap file) attribute name and type, for each attribute index name, for each index integrity constraints

For each relation:

For each view:

view name and definition

Plus statistics, authorization, buffer pool size, etc. * Catalogs are themselves stored as relations!
25

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Attr_Cat(attr_name, rel_name, type, position)

attr_name attr_name rel_name type position sid name login age gpa fid fname sal rel_name Attribute_Cat Attribute_Cat Attribute_Cat Attribute_Cat Students Students Students Students Students Faculty Faculty Faculty type string string string integer string string string integer real string string real position 1 2 3 4 1 2 3 4 5 1 2 3
26

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Summary

Disks provide cheap, non-volatile storage.

Random access, but cost depends on location of page on disk; important to arrange data sequentially to minimize seek and rotation delays.
Page stays in RAM until released by requestor. Written to disk when frame chosen for replacement (which is sometime after requestor releases the page). Choice of frame to replace based on replacement policy. Tries to pre-fetch several pages at a time.
27

Buffer manager brings pages into RAM.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke

Summary (Contd.)

DBMS vs. OS File Support

DBMS needs features not found in many OSs, e.g., forcing a page to disk, controlling the order of page writes to disk, files spanning disks, ability to control pre-fetching and page replacement policy based on predictable access patterns, etc.

Variable length record format with field offset directory offers support for direct access to ith field and null values. Slotted page format supports variable length records and allows records to move on page.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 28

Summary (Contd.)

File layer keeps track of pages in a file, and supports abstraction of a collection of records.

Pages with free space identified using linked list or directory structure (similar to how pages in file are kept track of).

Indexes support efficient retrieval of records based on the values in some fields. Catalog relations store information about relations, indexes and views. (Information that is common to all records in a given collection.)

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 29

Dangerous Google - Searching For Secrets PDF
89% (28)
Dangerous Google - Searching For Secrets PDF
12 pages
Organizational Structure and Design
100% (3)
Organizational Structure and Design
56 pages
EMV v4.1 Book 1 ICC To Terminal Interface
100% (1)
EMV v4.1 Book 1 ICC To Terminal Interface
201 pages
Storing Data: Disks and Files
No ratings yet
Storing Data: Disks and Files
29 pages
01 Disks Files
No ratings yet
01 Disks Files
30 pages
Storing Data: Disks and Files: Why Not Store Everything in Main Memory?
No ratings yet
Storing Data: Disks and Files: Why Not Store Everything in Main Memory?
10 pages
Storage and Index: Chapter 8, 9
No ratings yet
Storage and Index: Chapter 8, 9
29 pages
Database Management Systems, R. Ramakrishnan and J. Gehrke 1
No ratings yet
Database Management Systems, R. Ramakrishnan and J. Gehrke 1
32 pages
The Bare Basics: Storing Data On Disks and Files
No ratings yet
The Bare Basics: Storing Data On Disks and Files
33 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
90 pages
Storing Data: Disks and Files: (R&G Chapter 9)
No ratings yet
Storing Data: Disks and Files: (R&G Chapter 9)
39 pages
Review: (R&G Chapter 9) - Aren't Databases Great? - Relational Model - SQL
No ratings yet
Review: (R&G Chapter 9) - Aren't Databases Great? - Relational Model - SQL
7 pages
Layers of A DBMS: Query Optimization Query Processor Query
No ratings yet
Layers of A DBMS: Query Optimization Query Processor Query
15 pages
Block Diagram of A DBMS: (R&G Chapter 9)
No ratings yet
Block Diagram of A DBMS: (R&G Chapter 9)
6 pages
Journey of Byte: Lecture 4: Basic Concepts of DBMS 25.10.2016
No ratings yet
Journey of Byte: Lecture 4: Basic Concepts of DBMS 25.10.2016
8 pages
Lecture 14
No ratings yet
Lecture 14
69 pages
Overview of Storage and Indexing: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
Overview of Storage and Indexing: Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
65 pages
DBMS Internals: How Does It All Work?
No ratings yet
DBMS Internals: How Does It All Work?
94 pages
Chapter1 Intro
No ratings yet
Chapter1 Intro
27 pages
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
No ratings yet
Chapter 11: Indexing and Storage: Modified From: Database System Concepts, 6 Ed
53 pages
Disk Organization
No ratings yet
Disk Organization
29 pages
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
No ratings yet
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
36 pages
Notes 03 - Database Storage - I
No ratings yet
Notes 03 - Database Storage - I
42 pages
FULL
No ratings yet
FULL
449 pages
03 Storage1
No ratings yet
03 Storage1
4 pages
DBMS Storage and Indexing
No ratings yet
DBMS Storage and Indexing
80 pages
03-Storage1 Notes
No ratings yet
03-Storage1 Notes
4 pages
Ch8 Storage Indexing Overview-95
No ratings yet
Ch8 Storage Indexing Overview-95
32 pages
Lecture15 Fall
No ratings yet
Lecture15 Fall
102 pages
DBMS Chapter9 Review and Exercise Answers
No ratings yet
DBMS Chapter9 Review and Exercise Answers
6 pages
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
No ratings yet
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
28 pages
Review Review: Views - "Named" Queries Subqueries in FROM Clause
No ratings yet
Review Review: Views - "Named" Queries Subqueries in FROM Clause
18 pages
Storage and Indexing
No ratings yet
Storage and Indexing
32 pages
Disk
No ratings yet
Disk
49 pages
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
No ratings yet
File Storage and Indexing: Lesson 13 Cs 3200 Kathleen Durant PHD
46 pages
Chapter 1
No ratings yet
Chapter 1
14 pages
Database Management Systems: Course Objectives Course Components
No ratings yet
Database Management Systems: Course Objectives Course Components
4 pages
02 Storage (1)
No ratings yet
02 Storage (1)
104 pages
Lecture 1 Introduction Chapter 1
No ratings yet
Lecture 1 Introduction Chapter 1
81 pages
MYCH8
No ratings yet
MYCH8
35 pages
Database Management Systems: Hazem Hajj
No ratings yet
Database Management Systems: Hazem Hajj
20 pages
File and File Structure: Overview of Storage Device
No ratings yet
File and File Structure: Overview of Storage Device
29 pages
ch1
No ratings yet
ch1
39 pages
VND - Ms Powerpoint&Rendition 1
No ratings yet
VND - Ms Powerpoint&Rendition 1
118 pages
DBMS Indexing and Storage
No ratings yet
DBMS Indexing and Storage
53 pages
Layers of a DBMS
No ratings yet
Layers of a DBMS
38 pages
INFO445: Advanced Database Design, Management, and Maintenance
No ratings yet
INFO445: Advanced Database Design, Management, and Maintenance
21 pages
ADBMS Answer Bank
No ratings yet
ADBMS Answer Bank
90 pages
Lecture 3 - History of Database, Types of Databases, Components of DBMS
No ratings yet
Lecture 3 - History of Database, Types of Databases, Components of DBMS
42 pages
Ch8 Storage Indexing Overview 95 HH Rev 1
No ratings yet
Ch8 Storage Indexing Overview 95 HH Rev 1
42 pages
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
100% (2)
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke 1
17 pages
1-ch1 Intro-95
No ratings yet
1-ch1 Intro-95
17 pages
Ch1 Intro95-1
No ratings yet
Ch1 Intro95-1
17 pages
Lecture 01 - File Storage - Part 1
No ratings yet
Lecture 01 - File Storage - Part 1
48 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
23 pages
Introduction To Databases: Course Introduction A Review of Database Concepts
No ratings yet
Introduction To Databases: Course Introduction A Review of Database Concepts
43 pages
DBMS-UNIT-6 R16 (1)
No ratings yet
DBMS-UNIT-6 R16 (1)
16 pages
Lecture No. 1 PDF
No ratings yet
Lecture No. 1 PDF
57 pages
Data Storage and Access Methods: Min Song IS698
No ratings yet
Data Storage and Access Methods: Min Song IS698
50 pages
File Organization (1)
No ratings yet
File Organization (1)
93 pages
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
Computer Science I Essentials
From Everand
Computer Science I Essentials
Randall Raus
5/5 (7)
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
From Everand
FreeBSD Mastery: Storage Essentials: IT Mastery, #4
Michael W. Lucas
No ratings yet
Data Encryption Standard - DES and Other Symmetric Block Ciphers
No ratings yet
Data Encryption Standard - DES and Other Symmetric Block Ciphers
74 pages
Encryption and Cryptography
100% (2)
Encryption and Cryptography
114 pages
Windows NT Security
No ratings yet
Windows NT Security
10 pages
Managing in A Global Environment
No ratings yet
Managing in A Global Environment
37 pages
Electronic Payment Systems and Security
No ratings yet
Electronic Payment Systems and Security
36 pages
RSA Public Key Encryption Algorithm
No ratings yet
RSA Public Key Encryption Algorithm
77 pages
Decision Making: The Essence of The Manager's Job
100% (1)
Decision Making: The Essence of The Manager's Job
28 pages
Foundations of Control
No ratings yet
Foundations of Control
34 pages
Understanding Groups and Teams
No ratings yet
Understanding Groups and Teams
38 pages
Normalization
No ratings yet
Normalization
47 pages
Foundations of Planning
100% (2)
Foundations of Planning
28 pages
Ch8 Storage Indexing Overview-95
No ratings yet
Ch8 Storage Indexing Overview-95
25 pages
Maximum-Subarray Problem, Matrix Multiplication and Strassen's Algorithm
No ratings yet
Maximum-Subarray Problem, Matrix Multiplication and Strassen's Algorithm
18 pages
Steve Nison-Japanese Candlestick Charting Techniques-En
No ratings yet
Steve Nison-Japanese Candlestick Charting Techniques-En
14 pages
Jinesh (Real Time Systems in Linux)
No ratings yet
Jinesh (Real Time Systems in Linux)
27 pages
Preparing Data For Analysis Using Excel
No ratings yet
Preparing Data For Analysis Using Excel
10 pages
VNX DP Power Down VNX
No ratings yet
VNX DP Power Down VNX
6 pages
Bank Account Management Simulator
No ratings yet
Bank Account Management Simulator
3 pages
APT: It Is Time To Act: Dr. Eric Cole
No ratings yet
APT: It Is Time To Act: Dr. Eric Cole
22 pages
PCMCIA Tester/Simulator Card: Features
No ratings yet
PCMCIA Tester/Simulator Card: Features
4 pages
1-13 Reconfigure CMOS 1
No ratings yet
1-13 Reconfigure CMOS 1
3 pages
CS604 MCQs Solved Part 1
50% (2)
CS604 MCQs Solved Part 1
89 pages
SnortCP - 02 - Introduction To Snort Technology
No ratings yet
SnortCP - 02 - Introduction To Snort Technology
8 pages
MPC MST Detailed Syllabus
No ratings yet
MPC MST Detailed Syllabus
3 pages
MCA Fresher Resume: Contributed by Administrator Friday, 02 November 2007 Last Updated Friday, 30 July 2010
No ratings yet
MCA Fresher Resume: Contributed by Administrator Friday, 02 November 2007 Last Updated Friday, 30 July 2010
2 pages
ADA Questions
50% (2)
ADA Questions
221 pages
ECS Administration Guide
No ratings yet
ECS Administration Guide
128 pages
Amba
No ratings yet
Amba
13 pages
QTP (B + A)
No ratings yet
QTP (B + A)
4 pages
COMPPB52283rArtrPr AI Module 1 Slides
No ratings yet
COMPPB52283rArtrPr AI Module 1 Slides
103 pages
Multiple Face Detection Counting and Recognition Using Kernel Prototype Similarities
No ratings yet
Multiple Face Detection Counting and Recognition Using Kernel Prototype Similarities
48 pages
Ds Lesson Plan
No ratings yet
Ds Lesson Plan
4 pages
Test Questions
No ratings yet
Test Questions
4 pages
Chained Matrix Multiplication
No ratings yet
Chained Matrix Multiplication
32 pages
User Sequence Item
No ratings yet
User Sequence Item
8 pages
Transaction Management: CSE 444: Database Internals
No ratings yet
Transaction Management: CSE 444: Database Internals
10 pages
LIC Project
No ratings yet
LIC Project
59 pages
Chapter 9 - Virtual Memory Background: Prepared by Dr. Amjad Mahmood
No ratings yet
Chapter 9 - Virtual Memory Background: Prepared by Dr. Amjad Mahmood
16 pages
Sew PDF
No ratings yet
Sew PDF
355 pages
Amortized Analysis
No ratings yet
Amortized Analysis
35 pages
COMP2330 Data Communications and Networking: Dr. Chu Xiaowen
No ratings yet
COMP2330 Data Communications and Networking: Dr. Chu Xiaowen
8 pages
7-Intermediate Code Generation PDF
No ratings yet
7-Intermediate Code Generation PDF
24 pages