CS11 : Saving more space

A file sizes increase, streaming movie and audio services become the norm, compression techniques are used to make sure we don't run out of space (or patience).

We are learning ...
  • About compression techniques for sound, images and text
So that we can ...
  • Describe compression techniques used for compressing sound, images and text
  • Describe the difference between lossless and lossy compression
  • Apply the rules of specific lossless compression techniques
    - Run length encoding
    - Dictionary encoding
  • Compare characteristics of compression techniques
    - Compression ratio
    - Compression / decompression time

In this media rich age, audio and video files can take up massive amounts of disk space. An uncompressed audio file can be 30 - 40 MB and an uncompressed video can be 3 - 4 GB in size. With the advent of streaming music and video services, effective compression has become a real necessity to prevent ...

... interesting that you waited! Compression takes a raw file and uses algorithms to remove repetition or detail from the file to reduce it's file size. There are two types of compression - lossless and lossy which we will look at in turn.

Activity 1 Lossless Compression (80)

Lossless compression is a technique which reduces the size of a file without removing detail from it during the process. The file can be decompressed without loss of detail or quality. There are two common lossless compression techniques - Run Length Encoding (RLE) or dictionary based compression techniques. You can read about them in the textbook if you wish.

Task 1.1
 Compression exercises
Confused (BMP24).bmp
Cow (BMP24).bmp
eBook (TXT).txt
Man about the house (WAV).wav
Repeating (TXT).txt

Each of the files listed above are uncompressed. Click to view the image to see the uncompressed file size. You should also view the contents of the file to familiarise yourself with it's contents (this is important) ...

Click to enlarge

Using the 'Send to > Compressed (zipped) folder' right click option in Windows Explorer, compress each of the files in turn ...

Compare the uncompressed file size to the compressed file size and calculate the compression ratio ...

compression ratio = size of uncompressed file / size of compressed file

Now answer the following questions in your notebook.
  1. Write up (using screenshots) what you have found out from the exercise.
  2. Which files had the highest compression ratio?
  3. Can you explain the results you got from this compression exercise? HINT : Look at the appearance / structure of the file - what do you notice about the files with the highest compression ratio?
OUTCOME : Answers to questions on compression exercises in your notebook.

Run length encoding

Run Length Encoding is the most simple compression technique to understand. Watch the video (can you spot the spelling mistake?) ... 

Run Length Encoding Visualization

Task 1.2
 Practical Run Length Encoding

Download and execute the RLE.py script and provide it with the following inputs. The first shows how to encode a simple string where each character can be assumed to be represented by it's ASCII code. The second one provides the script with a list of tuples (read only lists). In raw file size terms, the list only occupies as much space as it's contents - ignore the brackets, quotes and commas and assume that each character and each digit is represented by one byte each.
  • encode('Look at the suuuuuuuuper booook!') ... raw file size = 31 bytes
  • decode([('a',3),('b',4),('c',1),('d',10)]) ... raw file size = 8 bytes
Calculate the compression ratio in each case ...

compression ratio = uncompressed file size / compressed file size

Answer the following questions in your notebooks
  1. Write up (using screenshots) what you have found out from this exercise
  2. What happens when you encode the first string? Would you choose RLE compression for this type of file?
  3. Under what conditions would the string undergo greatest compression with this technique?
OUTCOME : Write up of Run Length Encoding exercises in your notebooks

Other compression techniques (Huffman, Lempel-Ziv-Welch compression, ZIP)

There are many lossless compression techniques currently in use ...

Task 1.3
Web browser
Pencil and paper / Notebook

Watch the videos on the following lossless compression algorithms ...
  • Huffman compression

    Text Compression with Huffman Coding

  • Lempel-Ziv-Welch compression (used in UNIX)

    Elegant Compression in Text (The LZ 77 Method) - Computerphile

  • ZIP file format

    Used in windows - just read through the Wikipedia Design section and understand that ZIP is a composite compression format ...
Be aware that there are lots of different compression algorithms and that each is more or less suitable than other in certain situations.

Create a simple diagram / info-graphics for each compression technique to explain how it works. You could hand draw this if you like (and I would suggest that you did!) 

OUTCOME : Diagrams / explanation of specific text compression methods.

Images can also be compressed in a lossless fashion - PNG, GIF and TIFF file formats are lossless and result in no loss of detail upon decompression.

Activity 2 Lossy compression (80)

As the name suggests, lossy compression makes a file smaller by irreversible removal of detail from the original file. Lossy compression techniques rely on limits of human perception. The main issue with lossy compression is that when the detail is removed during compression, it cannot be reconstructed during decompression - it is lost.

Click to enlarge

One of the most common compressed image file formats is JPG which uses frequency analysis to remove detail from the image which the human eye will not notice. The image is split into 8x8 pixel squares. Areas of high contrast changes are left relatively uncompressed whereas areas of low contrast change (similar adjacent colours) are 'flattened'. If you wish, you can read this article which explains how JPEG compression works (but it is a bit complicated, especially the maths. You might want to skip over that bit!). Eventually, after the mathematical magic, Run Length Encoding is used to compress what is left of the image.

Task 2.1
 Investigating image compression
Cow (BMP24).bmp
Confused (BMP24).bmp
Adobe Fireworks

Using Adobe fireworks (or equivalent), experiment exporting (File > Image Preview) the two uncompressed image files as JPG images. Alter the 'quality' of the file first to 55 and then to 1 - inspect the file size, zoom in on the file and inspect the quality. What do you see?

Calculate the compression ratio for each image at the same quality ...

compression ratio = size of uncompressed file / size of compressed file

Answer the following questions in your notebooks.
  1. Write up (using screenshots) what you have learnt from the activity
  2. What do you notice happening? Can you explain what you have found out? Does it seem unusual? This article might help you.
Extension : Perform PNG (lossless) compression on the images as well. What results to you get from this? Does it seem more 'expected'?

OUTCOME : Explanation of image compression techniques (JPG / PNG)

Sound files are compressed in a similar way to images, mainly relying on the lack of human perception of certain frequencies of sound and certain changes in frequency.

Task 2.2 
Investigating MP3 compression
Web browser
Man about the house (WAV).wav

First, read and make some notes on this article. Now, using Audacity, experiment with the MP3 export options - consider the effect on file size (i.e. compression) and quality.  As always with these things, experiment with the extremes first to see the effect that the settings have.

OUTCOME : Explanation of audio compression.

Extension Activities 

How about these ...
  • Investigating a bitmap

    The following 24 bit bitmap image is stored on a computer. 

    - Calculate the raw bitmap filesize of the image (you will need to work out how many colours and therefore how many bits you need per pixel first).
    - Devise a suitable colour palette for the image.
    - Apply a lossless Run Length Encoding (RLE) compression technique on the image to reduce it's filesize whilst maintaining quality. Also calculate the reduction in file size due to the compression.
    - This compression technique is useful for simple icon style images. Why is it not suitable / effective for photographs? What compression technique might be more appropriate?

  • Another YouTube Video

    Watch this video to sum up the process of compression and decompression ...
Compression - Computerphile
  • MP3 Compression - the full story

    Finally, there is another (much more complex) article on MP3 compression you might want to read ...

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.