CS31 : Saving Space

This lesson looks at different compression techniques.

We are learning ...
  • About lossy and lossless compression
So that we can ...
  • Understand the principles of lossy and lossless compression techniques, specifically
    - Run length encoding
    - Huffman trees
  • State and explain which compression technique could be applied to music / video, photos and text files

CGP The Revision Guide Page 75
CGP Exam Practice Workbook Page 84

# Get Ready.png

Activity 1
What is compression and why do it?

Task 1.1 Definitions
Where you learn the basic terminology of compression


Read the following information carefully, making notes if you wish.

Data compression

Data compression, or bit-rate reduction, involves encoding information using fewer bits than the original representation. Compression can either be lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression (hence the name). Lossy compression reduces bits by identifying unnecessary information and removing it completely.

Compression is useful because it helps reduce resource usage, such as data storage space or transmission capacity. However, because compressed data must be decompressed before use, this processing imposes extra computational costs through decompression. For instance, streaming video decompression may require expensive hardware to decompress the video fast enough for it to be viewed as it is being decompressed and the option to decompress the video in full before watching may require additional time and / or storage.

In your notebook / on paper

Answer the following questions, in full sentences.
  1. Why is compression sometimes known as bitrate reduction?
  2. What is the difference between lossy and lossless compression?
  3. Why is compression used?
  4. Describe one problem which could arise through the use of compression.

Task 1.2 Reasons to compress?
Where we learn about why compression is used

There are a number of reasons why we might want to compress data files ...
  • Backup and archiving of files
  • Transferring files across a network
  • Web streaming of media files
  • Transfer of files by email
  • File encryption and protection

In your notebooks / on paper

Perform a "What, Why, Who, When, How, Where" analysis on these reasons for data compression. You might want to use the following website to help you. You could present your ideas as a spider diagram, a mind map or a table. Consider the amount of time you have to complete this and chose one of the "W,W,W,W,H,W" options for each reason.

Activity 2
Lossless compression

Consider a 1-bit encoded bitmap image containing black pixels on a solid white background. There will be many long runs of white pixels in the blank space, and many short runs of black pixels within the image. A hypothetical scan line, with B representing a black pixel and W representing white, might read as follows:


The data in this scan line contains redundancy - the repeated data. Could there be a more efficient way of writing it?

But what about textual data? How would you go about compressing the following passage of text?

The Lion and the Mouse

Once when a Lion was asleep a little Mouse began running up and down upon him; this soon wakened the Lion, who placed his huge paw upon him, and opened his big jaws to swallow him. "Pardon, O King," cried the little Mouse: "forgive me this time, I shall never forget it: who knows but what I may be able to do you a turn some of these days?" The Lion was so tickled at the idea of the Mouse being able to help him, that he lifted up his paw and let him go. Some time after the Lion was caught in a trap, and the hunters who desired to carry him alive to the King, tied him to a tree while they went in search of a wagon to carry him on. Just then the little Mouse happened to pass by, and seeing the sad plight in which the Lion was, went up to him and soon gnawed away the ropes that bound the King of the Beasts. "Was I not right?" said the little Mouse.
Hint : Can you spot any redundant data? Anything repeated, even inside words?

Task 2.1 Run Length Encoding
Where we learn how to perform RLE on a scan line

Can you see the connection between the scan line (above) and the following string? You can discuss this with a peer.


This is called Run Length Encoding (RLE) and is used for compression without data loss. Now it's your turn! 

With a partner

Perform RLE compression on the following raw bitmap image and rewrite the image as a compressed string. Use 'W' to represent white pixels and 'B' to represent black pixels.

Click to engage

With a partner

If the raw bitmap was encoded using 'W' for white and 'B' for black, it would occupy 256 characters of storage. How many characters does your RLE compressed version occupy? Assume that each number in the compressed version occupies one character like 'W' and 'B' do (look at the example above in green to help you).

In your notebook / on paper

Calculate the compression ratio for this image using this formula ...

Task 2.2 Huffman encoding
Where we learn how to perform a type of dictionary encoding


Watch the following video which explains how to create a Huffman code for the text mississippi river. After you have watched the video, repeat the exercise in your notebook. Your Huffman code should consists of ...
  • A binary string (the Huffman code)
  • A dictionary
Text compression with Huffman coding (6:09)

In your notebook / on paper

Now choose one of the following simple phrases and perform a Huffman coding exercise to compress it. Don't forget the spaces - use the underscore ( _ ) character to represent a space in the dictionary.
You should produce a binary Huffman code and a dictionary. Next, calculate the compression ratio of the string using the formula from Task 1.

Task 2.3 Huffman decoding
Where we learn how to decode a Huffman code

In your notebook / on paper

Use the following dictionaries to decode the corresponding Huffman codes. The underscore symbol represents a space character.

Dictionary Huffman Code 
E ; 1 / B ; 010 / R ; 011 / T ; 000 / _ ; 001 0000111100101011 
N ; 01 / E ; 11 / A ; 000 / B ; 001 / R ; 100 / G ; 1010 / _ ; 1011  101010011110110110011100001 
G ; 0 / E ; 100 / L ; 101 / U ; 110 / T ; 1110 / _ ; 11110 / R ; 11111  0110001011001111011101111111000101100 

In your notebook / on paper

Now calculate the compression ratio of the Huffman code using the equation in Task 1.

Task 2.4 Practical compression
Where we perform practical compression on a computer

Make sure you know how to compress and decompress files using Windows explorer

Before you start this activity, you must know how to compress files using windows explorer. If you don't ask your teacher to demonstrate it to you.

Download the resources

Download the '
ebook.txt' file and the 'repeating.txt' file from the links and save in your documents. Inspect the file size of each document by right clicking and choosing properties. They are exactly the same size in this uncompressed form.

Inspect the file contents

Open each file and look carefully at it's contents. Can you see what is different?

Compress each file

Now compress each file by right clicking on the file and choosing 'send to > compressed folder'.

Inspect the properties of the compressed files

Inspect the properties of the compressed folders you have created. What is different? Which one compressed more? Why?

In your notebook / on paper

Write up what you have found out in your notebooks. You might wish to use some screenshots of the files to help with your explanations.

Activity 3
Lossy compression

The human eye and the human ear are not sensitive enough instruments to capture all the detail we see or hear. Lossy compression techniques take advantage of this by removing unnecessary detail from an image or a sound before it is compressed using more traditional methods.

Task 3.1 Lossy compression explained (by you)
Where you explain how lossy compression works

The difference between lossless and lossy compression

Read and make some notes on the information in the box which explains the difference between lossless and lossy compression techniques.

Lossless compression

Lossless compression preserves the original data on decompression.

Lossy compression

Simple differences in data are removed before the data is compressed. Lossy compression destroys this extra detail in the data which can never be restored when the data is decompressed.

The effect of lossy compression

Read and make some notes for your folders on the information in the notebox.

The human senses are too weak to notice the difference

Even though we think that our senses are extremely finely tuned to all the nuances of the world, they still miss detail. Lossy compression techniques take advantage of this by removing detail from the original image which can never be recovered.

However, the saving in file space / network transfer speed is worth it!

On YouTube

Video streaming services like YouTube use lossy compression dynamically to reduce the quality of a video whilst it is streaming for poor connections. 

  • Whilst you watch the following YouTube video, experiment with the streaming quality settings. Start at the highest quality (1080p in this case) and work down to the lowest quality settings (144p in this case).
  • At what point do you start to notice the visual difference between the settings?
  • If you are careful, you may be able to assess the speed that the video 'buffers'. Look for the grey bar after you first load the video and see how long it takes to move from the left to the right of the video.

In your notebook / on paper

Write about what you have found out in your notebooks. Your notes should include ...
  • The difference between lossy and lossless compression
  • Reasons why lossy compression is suitable for video and image
  • Reasons why lossy compression is unsuitable for text
  • Effect on compression level on streaming video quality

Assessment Task (Homework)

Create three multiple choice questions about this topic. Each question should have 4 possible answers - one which is correct, two which are nearly correct and one which is obviously wrong.

Grading rubric

MASTER : You have created three challenging multiple choice questions and have presented them in a suitable way. You have provided an answer key so I know what the correct answers are.
APPRENTICE You have created three simple multiple choice questions. You have not provided an answer key, but it's quite easy to work out which is the correct answer.
NOVICE : You have created three simple multiple choice questions. You have not provided an answer key and I have no idea which is the correct answer.

# Flash cards.png
Click to load key word list to help you make your own flash cards 

Hungry for more?
  • Spend a couple of hours watching the Compressor Head series on YouTube.
  • Read about how to compress Vicky Pollard at CS4FN.