0% found this document useful (0 votes)
259 views61 pages

A-Level 14 Presentation - Compression, Encryption and Hashing

The document discusses various methods of compressing different file types including images, video, and audio. It describes lossless compression techniques like run-length encoding and Huffman coding that reduce file sizes without losing information. Lossy compression techniques for images, video and audio that remove unnecessary data are also covered, along with common file formats like JPEG, MP3, MP4 that use these compression methods.

Uploaded by

Kenma Kozume
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
259 views61 pages

A-Level 14 Presentation - Compression, Encryption and Hashing

The document discusses various methods of compressing different file types including images, video, and audio. It describes lossless compression techniques like run-length encoding and Huffman coding that reduce file sizes without losing information. Lossy compression techniques for images, video and audio that remove unnecessary data are also covered, along with common file formats like JPEG, MP3, MP4 that use these compression methods.

Uploaded by

Kenma Kozume
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 61

Teach Computer Science

A-level

Compression,
encryption and hashing

teachcomputerscience.com
2

Lesson Objectives
Students will learn about:
▪ Why compressing files is important.
▪ How text, image, audio, and video files are compressed.
▪ Effects of compressing a file.
▪ What the various file formats are.
▪ Compression algorithms: Run-length encoding and Huffman
coding.
▪ Encryption: Symmetric and Asymmetric.
▪ Hashing, digital certificates, and digital signatures.
teachcomputerscience.com
1.
Content

teachcomputerscience.com
4

Introduction
▪ File handling is one of the primary functions of a computer system.
▪ Based on the type of data that needs to be stored, several types of file
formats are available.
▪ Each file format occupies a certain amount of storage space.
▪ An image file with good quality occupies around 1 MB, and a video file
needs to store 25 frames per second, occupying a large amount of
storage space. Thus, compression methods are used to reduce the size
of the files.
▪ Compression is also helpful in reducing the download time of image,
audio, and video files from the Internet.
teachcomputerscience.com
5

Compressing Image files


▪ Image compression is the reduction in file size to reduce
download times and storage requirements.
▪ Compressing an image also changes its attributes, such as file
type, resolution, dimensions and bit depth.
▪ There are two types of compression:
✔ Lossless compression
✔ Lossy compression

teachcomputerscience.com
6

Lossless compression
▪ When the file is compressed, the quality of the image remains the
same.
▪ The image can be reconstructed into its original form.
▪ In this case, information is very important and cannot be lost.

teachcomputerscience.com
7
Lossless compression Index word Index word
▪ Let us consider a text file with the 1 see 10 day
following sentence: 2 a 11 you’ll
▪ “See a pin and pick it up, all the day 3 pin 12 have
you'll have good luck; see a pin and let
4 and 13 good
it lie, bad luck you'll have all day”.
5 pick 14 luck
▪ This text file can be compressed by
making a table for this information . 6 it 15 let

▪ A character occupies a byte of memory. 7 up 16 lie


Whereas, numbers occupy 8 all 17 bad
comparatively less memory. The 9 the
number 16 is represented in 5 bits
(10000).
teachcomputerscience.com
8

“See a pin and pick it up, all the day you'll have good Index word Index word
luck; see a pin and let it lie, bad luck you'll have all day” 1 see 10 day
2 a 11 you’ll
▪ The sentence can be coded in the form
3 pin 12 have
of numbers in the table and stored in
the computer: 4 and 13 good
▪ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 2 3 5 pick 14 luck
4 15 6 16 17 14 11 12 8 9 10 6 it 15 let
▪ This saves memory by using codes for 7 up 16 lie
words that are repeated.
8 all 17 bad
▪ With the code and the index table, the
9 the
complete sentence can be recreated.

teachcomputerscience.com
9

Lossy compression
▪ When a file is compressed, the unnecessary bits of information are
removed permanently.
▪ This information is less likely to be noticed by humans.
▪ This type of compression is used for photographs where the
information to be compressed cannot be predicted.

teachcomputerscience.com
10

Uncompressed file formats


▪ TIFF (.tif) and BMP (.bmp) refers to raw bitmap, that is, uncompressed
image files.
▪ As these file formats are uncompressed, they represent images with
the highest image quality.

teachcomputerscience.com
11

Types of compressed files


Format Type of Application
compression
PNG Lossless Used for transferring images over the Internet.

JPEG Lossy Higher compression rate than a PNG.


Used in digital cameras and web pages.
GIF Lossless Compresses images to a maximum of 8-bit depth.
Not used for high quality images.
A sequence of gif images is used to store animated
graphics.
Used for small images such as logos, icons, etc.
PDF Lossless Encodes text and graphics. teachcomputerscience.com
12

Videos
▪ Digital videos are created by playing a series of images at high speed.
▪ A typical HD video has a frame rate of 60 fps.
▪ Advanced video standards support up to 300 fps. The sampling rate
of a video is given in frames per second. This is also measured in
Hertz.
▪ Video files also have a bit rate that defines the quality of audio and
image.

teachcomputerscience.com
13

Compressing video files


▪ Compressing a video file reduces
the resolution, dimensions and bit
rate.
▪ Compressing a video file may also
lead to poor quality and random
coloured blocks on the screen.
These blocks are called artefacts.
▪ MP4 and MOV are two examples of
lossy video file formats.

teachcomputerscience.com
14

Streaming audio files


▪ Compression is very helpful in streaming and downloading audio
and video files.
▪ MP3 file format is used for audio compression.
▪ MP3 allows for up to 90% compression.
▪ MP3 files are used for storing files on computers, MP3 players,
mobile phones, etc. A CD-quality audio is converted to an MP3 file
format using file compression software.
▪ Even though the quality of an MP3 file cannot match the original CD
file, it is still satisfactory for various purposes.

teachcomputerscience.com
15

Streaming audio files: MP3


▪ Lossy file compression technique.
▪ A user does not hear any difference while listening to the MP3 file
because of perpetual music shaping in file compression algorithms
that only removes details that humans cannot hear.
▪ Therefore, only a few parts of the sound file are removed and sound
quality is not compromised too much.
▪ The bit rate affects the quality of the MP3 file. It ranges from 80 kbps
to 320 kbps. An original CD-quality audio has a bit rate of 1411 kbps.

teachcomputerscience.com
16

Streaming video files: MP4


▪ MP4 files are similar to MP3 files, but also allow storage of other
multimedia files such as video, animation, photos, etc.
▪ Video files are also compressed into MP4 format to stream online.
This file is used for transmission over digital channels, cables, and
satellites.
▪ DVD movies are available in this format.
▪ A high-definition 720p (HD 720) video has a bit rate of 1200-4000
kbps. A 4K video has a bit rate of 8000-14000 kbps.
▪ The stream rate for a standard definition video is 1 Mbps, for HD,
video is 3.5 Mbps, and 4K ultra-HD video, it is 15 Mbps.
teachcomputerscience.com
17

Streaming video files


▪ FLAC and ALAC are examples of open source lossless compression
formats. File sizes can be reduced up to 50% without losing quality.
▪ A 1,411 kbps WAV file can be compressed to a 64 kbps MP3 (moving
pictures experts group audio layer 3) file by losing some data
permanently.
▪ The bit depth is also reduced to remove data. Hence, in lossy
compression, the bit depth is a variable.
▪ MP3 and AAC are patent codecs and Ogg Vorbis is an example of an
open-source lossy compression technique.

teachcomputerscience.com
18

Codecs and Compression


Algorithms
▪ Codecs are programs that encode or decode an audio, image,
or video file.
▪ Compression codecs are aimed at reducing the size of a file
without affecting its quality.
▪ Algorithms decide the amount of data that can be removed to
reduce the file size.

teachcomputerscience.com
19

Run-length encoding (RLE)


▪ Run-length encoding (RLE) is an example of a compression algorithm
that converts the consecutive similar values into code.
▪ This code consists of the identical value and the number of times
this value is repeated.
▪ This is a lossless type of compression.

teachcomputerscience.com
20

Run-length encoding (RLE)


▪ The computer stores binary value
1 for white and binary value 0 for
black for each row of the image.
▪ The first row in the image can be
represented as 2 0 5 1 1 0. This
code represents 2 black pixels, 5
white pixels and 1 black pixels.
▪ Similarly, the second row in the
image is represented as 1 1 6 0 1 1.

teachcomputerscience.com
21

Run-length encoding (RLE)


▪ This type of coding is not efficient if
the file does not have many runs.
▪ In such cases, the file size may
increase instead of getting
compressed.
▪ RLE is therefore used only in
simple images with large areas of
the same colour.

teachcomputerscience.com
22

Run-length encoding (RLE)

▪ RLE is also used to compress


video files.

teachcomputerscience.com
23

Huffman coding
▪ A compression technique used to reduce the number of bits that
represents each letter.
▪ A binary tree is used to encode letters.
▪ A binary tree is a data structure made of nodes and is constructed
based on hierarchy. A parent node in a binary tree has up to two
child nodes.

teachcomputerscience.com
24

Huffman coding
▪ In ASCII coding, each letter is represented using 7 bits.
▪ In Huffman coding, each letter is represented with a different
number of bits.
▪ The most frequently appearing letters are represented with less
number of bits.
▪ The number of bits required to store information is reduced.

teachcomputerscience.com
25

Huffman coding
▪ Consider the sentence: Betty ate butter.
▪ The frequency of characters in this sentence is shown in the table.
▪ There are 17 characters in total (including spaces).
▪ Therefore, the total number of bits used to represent their ASCII
codes is: 17×7= 119 bits.

Letter A B E R T U Y Space

Frequency 1 2 3 1 5 1 1 3

teachcomputerscience.com
26

Huffman coding 0 1

0 T 1 0 E
1
▪ Consider the sentence:
B
Betty ate butter. 0 1 Sp 0 1
.
▪ Each letter is now A R U Y
assigned a binary value:

Letter A B E R T U Y Space

Frequency 1 2 3 1 5 1 1 2

Binary 010 00 1 011 0 110 111 10


value teachcomputerscience.com
27

Huffman coding
▪ Substituting these values in the sentence and calculating the total
number of bits: 3 + 4 + 3 + 3 + 5 + 3 + 3 +4 = 28 bits.
▪ Using Huffman coding, we have saved 119 – 28 = 91 bits.

Letter A B E R T U Y Space

Frequency 1 2 3 1 5 1 1 2

Binary 010 00 1 011 0 110 111 10


value
teachcomputerscience.com
28

Huffman coding: Building a


binary tree
▪ We need the lowest number of bits for letters with a higher
frequency.
▪ Hence, we must award a slightly greater number of bits for letters
with lower frequency.
▪ We shall start with the letters with lower frequency. Letters A, R, U,
and Y have the lowest frequency of 1.

Step 1:

A R U Y
teachcomputerscience.com
29

Huffman coding: Building a


binary tree
▪ Next, let us consider characters B and space, with a frequency of 2
each.

Step
2:

B Sp
.
A R U Y

teachcomputerscience.com
30

Huffman coding: Building a


binary tree
▪ Next, let us consider the letter E with a frequency of 3.

Step
3:
E

B Sp
.
A R U Y

teachcomputerscience.com
31

Huffman coding: Building a


binary tree
▪ The letter T has the highest frequency of 5. Let us include T and
connect the nodes to the root.

Step roo
5: t
T Sp
.
B
E

A R U Y

teachcomputerscience.com
32

Huffman coding: Building a


binary tree
▪ The final step of forming a
Huffman tree is giving
0 1
binary values to each
connection. E
0 T 1 0 1
▪ Left branches are assigned
B
value 0 and right branches 0 1 Sp 0 1
are assigned value 1. .
A R U Y
▪ Each path terminates to a
leaf.
teachcomputerscience.com
33

Huffman coding: Building a


binary tree
▪ Using the tree, each letter is assigned
a binary value, starting from the root
to the leaf.
▪ This is an example to use Huffman
coding. There shall be different
character coding for a letter.

Letter A B E R T U Y Space

Binary 010 00 1 011 0 110 111 10


value
teachcomputerscience.com
34

Encryption
▪ Encryption is the process of changing the data into another form or
code so that only people with access to a secret key can read it. For
others, the message will not be in a readable form.
▪ This technique is used in wireless networks for ensuring security.
▪ This technique is also used in https, which is the secured form of a
http webpage. The inputs from the user are encrypted to offer a
secure online experience during banking, shopping, etc.

teachcomputerscience.com
35

Caesar cypher
▪ A basic encryption algorithm is Caesar cypher where the alphabets are
displaced by a known amount.
▪ Example: Caesar code wherein the letters are displaced by 5 places.

▪ Using the above code, “INITIATE PLAN A” will be coded as “NSNYFYJ UQFS F”.
▪ The secret information used to encrypt or decrypt the message is called a key.

teachcomputerscience.com
36

Disadvantages of Caesar cypher


▪ The message encrypted through this cypher can be easily decrypted
and hence, can be easily cracked by unintended users (For example,
hackers).
▪ Longer messages are easier to decode.
▪ Thus, this technique cannot be used for highly confidential information.

teachcomputerscience.com
37

The Vernam cypher

▪ The Vernam cypher uses a one-time pad or an encryption key.


▪ Encryption key: Its length is equal to the length of characters, or greater
than that. The keys are used only once.
▪ Sender and receiver meet in person to determine the keys and destroy
them after the exchange of messages.

teachcomputerscience.com
38

The Vernam cipher

▪ Vernam cypher works with the ASCII codes of characters. Each ASCII code is
taken in binary form.
▪ The one-time key is also taken in binary form.
▪ An XOR operation is performed between ASCII codes and the one-time key.
▪ The key is completely random, and its length is equal to or greater than the
length of original message.
▪ To decrypt the cypher text, an XOR operation is performed between the
cypher text and one-time pad.
teachcomputerscience.com
39

Encryption: Symmetric

▪ Sender encrypts the message with a key.


▪ Receiver requires a key to decrypt the message.
▪ Both the sender’s key and receiver’s key are the same.
▪ It is important that the sender sends the key to the receiver in a
protected manner. Otherwise, it could be intercepted by a hacker, and
all the messages can be read by a third party.

teachcomputerscience.com
Step Sender Receiver 40
Method used to
generate and Sender and receiver both choose the same encryption algorithm.
1
distribute key Ex: 13XMOD5

13PMOD5. Sender chooses a


13QMOD5. Receiver chooses a
value for encryption algorithm,
2 value for encryption algorithm,
which is kept secret.
which is kept secret. Q=3
Example: P=5

Both sender and Value is substituted in


receiver now end Value is substituted in algorithm.
3 algorithm.
up with the same 13PMOD5 =135MOD5=3
13 MOD5 =133MOD5=2
Q

value (2), which


acts as the key for 4 The values are transmitted to each other.
further
communication.
The received value is substituted The received value is
teachcomputerscience.com
5 in algorithm. substituted in algorithm.
2PMOD 5 =25MOD5=2 3QMOD 5 =33MOD 5=2
41

Encryption: Asymmetric

▪ Symmetric key encryption can be easily cracked by unintended users.


▪ Cannot be used for highly confidential information.
▪ Therefore, algorithms with stronger keys are used.
▪ The bigger a key, the better the security.
▪ More than one keys are also used to improve security.

teachcomputerscience.com
42

Keys
Public keys Private keys
Public keys are available to all users and Private keys, which are different from public
are used to encrypt a message. keys, are only available to the intended
recipient. These keys are used to decrypt the
message.

Public and private keys of a particular algorithm complement each other.

teachcomputerscience.com
43

Encryption using keys

▪ A key making algorithm is responsible for generating public and private


keys of the receiver.
▪ The public key can be found in a directory.
▪ Sender encrypts the message using a public key in this directory.
▪ Receiver receives this data and decrypts it using the private key.

teachcomputerscience.com
44
Bob’s public key

Alice Welcome
Encryption (Sender) Bob!

using keys Encryption

Ergfh34
y5u1
Message transmitted

Bob Welcome
(Receiver) Bob!
Decryption

Bob’s
private
key

Key Bob’s
Large public
making
number key
algorithm

teachcomputerscience.com
45

Hashing
▪ A hashing function maps input of arbitrary length to a fixed-length or a
smaller output.
▪ A hashing algorithm converts a text message into a string of hexadecimal
characters.
▪ Hashing functions are one-way functions. The encrypted messages cannot
be converted back to the original message.
▪ This is widely used to protect stored passwords and PINs from hackers. To
verify a password, the password entered by a user is applied to a hash
function. The result is verified with the stored password to grant access.
teachcomputerscience.com
46

Hashing
▪ Example: A 128-bit string is generated for any message that is encrypted
using the MD4 algorithm. MD4 is a cryptographic hash function.
▪ The text “Hello World!!! Welcome.” is converted as
“EBA941A5FD543A15919B803743868151”.
▪ The same message without a full-stop produces a different value in
MD4 algorithm, “92fc661dd222843f25f6c02517299a79”.
▪ So, it is difficult to decode a message encrypted with a hash function.

teachcomputerscience.com
47

Digital signature
▪ Online documents are authenticated using digital signatures.
▪ For creating a digital signature, we start with the hash total.
▪ Hash total is a mathematical value calculated from the hash function.
▪ Sender encrypts the hash total using his private key to create a digital
signature. This signature is combined with the original message.
▪ Now, this combined message is encrypted using the receiver’s public
key.

teachcomputerscience.com
48

Digital signature: Sender’s side

Original
message Encrypted
Receiver’s message
public key with
Sender’s signature
Hash
Private
total
key

Digital signature

teachcomputerscience.com
49

Digital signature
▪ The receiver decrypts the message using his private key to obtain the
original message along with an encrypted digital signature.
▪ To decrypt the digital signature, the sender’s public key is used.
▪ Hash total is obtained by decrypting the hash total.
▪ Hash total is also calculated by the receiver using the original message.
▪ Both, the hash total values are compared. If they are the same, the
message is a genuine one without any modifications.

teachcomputerscience.com
50

Digital signature: Receiver’s


side

Calculate Yes
Original
Encrypted Hash Equal?
message Genuine
message Receiver’s total
message
with private
signature key Sender’s
Hash
Signature Public
key total

teachcomputerscience.com
51

Digital signature
▪ In case of modifications of the original message, the calculated hash
total value would have changed.
▪ To improve security, the original date and time can also be included in
the original message.
▪ A digital certificate ensures that the sender’s public key belongs to a
trusted source.
▪ It is possible to create a fake signature with a bogus private key using
the sender’s public key and claiming to be that sender.

teachcomputerscience.com
52

Digital Certificates
▪ Certificate authorities such as Verisign issues certificate to websites and
senders claiming the correctness of public key.
▪ The certificate contains the name of the sender, his public key, and
expiry date along with a digital signature of the certificate authority.

teachcomputerscience.com
53

Let’s review some concepts


Compression Lossless compression Lossy compression
Reduces the amount of memory When the file is compressed, When a file is compressed, the
required to store a file. the quality of the image unnecessary bits of information
remains the same and the are removed permanently.
Types: Lossy and lossless
image can be reconstructed to
Ex: JPEG
its original form. Ex: PNG, GIF,
PDF

Huffman coding Videos Streaming audio and video files


A compression technique used to Compressing a video file reduces Audio: MP3
reduce the number of bits that the resolution, dimensions and
Video: MP4
represents each letter by using a bit rate.
binary tree. Open source lossless compression
Lossy formats: MP4 and MOV
formats: FLAC and ALAC

teachcomputerscience.com
54

Let’s review some concepts


Encryption Public key Private key
Encryption is the process of Public keys are available to all Private key, which is different from
changing the data into another users and are used to encrypt a public key, is only available to the
form or code so that only people a message. the intended recipient.
with access to a secret key can
read it.

Hash function Digital signatures Digital certificates


A function that maps the input of An electronic signature that An electronic document that
arbitrary length to a fixed length authenticates documents and confirms the ownership of public
or a smaller output. websites. keys.

teachcomputerscience.com
2.
Activity

teachcomputerscience.com
56

Activity-1
Duration: 15 minutes

1. Use Huffman coding to create a Huffman tree for the sentence: GOOD
MORNING GORDON. Also, state the character coding for each character.

Letter

Frequency

Binary
value
2. Using Huffman coding, how many bits have you saved?

teachcomputerscience.com
57

Activity-2
Duration: 15 minutes

1. Use Caesar cypher to encrypt the message ‘OPERATION EXECUTED’ by


shifting the characters +5 places.
2. Encrypt the same message in question (1), but now with a shift of -4
places.
3. Using the Vernam cypher, encrypt the character Y with the key: a.
ASCII code for Y: 01011001 & a: 01100001. Use the ASCII table to
answer this question.

teachcomputerscience.com
3.
End of topic questions

teachcomputerscience.com
59

End of topic questions


1. Why is file compression important in computer systems?
2. What is the difference between lossy and lossless compression?
3. What are the advantages and disadvantages of using an MP3 file format
to compress audio and video files?
4. How does compressing a video file affect its quality?
5. What is run-length encoding?
6. What type of compression algorithm (lossy/lossless) will you use for
sending a computer program as an email attachment? Why?
teachcomputerscience.com
60

7. Encode the image given, using run-length encoding.

8. Under what circumstances is run-length encoding not efficient?

teachcomputerscience.com
61

9. What do you mean by the terms public key and private key?
10. Use the encryption algorithm 5XMOD11 to distribute keys between a
sender and receiver.
11. Using a flowchart, explain how a receiver finds out whether the
received encrypted message with a signature is genuine or not.

teachcomputerscience.com

You might also like