CS25 : How do I stop things going wrong?


Defensive programming techniques expect users to push buttons they are not meant to push and type words that shouldn't be typed. Prevention is often better then cure but sometimes, you just can think of everything!

We are learning ...
  • How to use validation and verification techniques to prevent errors in our code
So that we can ...
  • Describe the types of errors that can occur in computer programs
  • Understand what verification involves
  • Describe validation techniques using practical examples
  • Use exception handling
  • Describe how parity detects errors
  • Apply majority voting techniques
  • Derive check digits and understand their applications
  • Describe the applications of checksums


Activity 1 Errors in computer programs 

There are three types of errors which an occur in computer programs ...

https://drive.google.com/file/d/0B83yXMOilskaLXNTY3E2azFpX2M/view?usp=drive_web
Click to enlarge

In general, syntax errors are avoidable if you type your code correctly. There really is no excuse. Prevention of runtime errors usually requires the use of validation or verification techniques which we'll look at shortly. Semantic, or logic errors, are hardest to spot and can only really be picked up by tracing operation of the code manually. Let's look at each one in turn.

Syntax errors

There is really only one way to prevent syntax errors in your code ...


Runtime errors

Generally, we validate or verify the user input to prevent runtime errors using various techniques ...
  • Presence check
  • Type check
  • Float check
  • Length check
  • Range check
  • Lookup check
  • Format check
  • Double entry check (verification, which is different)
... or use exception handling to pick up the rest of the things we haven't thought of.



Task 1.1
 Validate

There is a compressed folder called validation.zip in the lesson resources. Download and extract this to a suitable place in your user area. There are 10 scripts in the folder. Each one is written as a function which takes one or more parameters depending on which type of check it is. They all use a  while True: construct, issuing a break if the value is valid. This is standard practice - get used to it!
  • Presence check : getValuePresence.py
    This function takes a prompt as a parameter and return the not null value.


  • Type check : getValueType.py
    This takes a prompt and a datatype (from "alpha" or "digit") and return a validated value.


  • Float check : getValueFloatRegex.py / getValueFloatTryExcept.py
    There are two ways of doing this, neither of them particularly easy. They both take a single prompt parameter. The first one use a pattern matching regular expression to check for the format of the number whereas the second one tries to float() the input and then checks whether it is a whole number before returning the validated value. For either of these two methods to work, you have to import particular libraries at the start of the script - don't forget!




  • Length check : getValueMaxLength.py / getValueMinLength.py / getValueRangeLength.py
    There are three separate validation scripts for this checking for minimum, maximum and range lengths respectively. You choose. They all take a prompt and either a minimum or maximum length or both.


    [NEED TO UPDATE - DOCSTRING WRONG]




    [NEED UPDATE - DOCSTRING WRONG]

  • Range check : getValueRange.py
    This range check checks the actual range of the numbers entered rather than the length of the input. The function takes three parameters - the prompt, minimum value and maximum value.


  • Lookup check : getValueList.py
    This performs a so called enumerated lookup. An enumerated value is one which comes from a list of possible values. This function takes two parameters - the prompt and a list. If you were clever, you could include a list comprehension as the list parameter which would be even cooler!


  • Format check : getValueFormat.py
    Probably the most difficult but most powerful validation routine uses regular expressions to check the actual structure of the input value to ensure that it is correctly formed. Takes three parameters - a prompt, the actual regular expression and a visual format.


In a script(s) : As you might have guessed, just looking at these routines isn't enough to make them useful. Firstly, you should Rubber Duck the scripts and then try to include them in suitably structured programs. You have to come up with the contexts yourself - yes, yourself.

Print out your scripts when you have completed them using Notepad++ so that they look pretty!

OUTCOME : 10 scripts containing one validation routine per script.


The reason why I have not included the verification check in Task 1.1 is because it's a verification check and that's different to a validation check. That's why.


Verification checks merely that you didn't make any mistakes when you typed the value at the prompt. It does not ensure that the value is of the correct format, type, range, value etc. Verification is often used when entering passwords because you can't see the passwords when you are typing them and you might get it wrong if you only had to type it once.

Don't you miss Windows XP?

Task 1.2
 Verify yourself

Download the zip file called verification.zip from the resources for the lesson and extract. It contains only one script ...
  • Verification check : getValueDouble.py
    Takes just one parameter (prompt) and only returns a value if it is entered the same twice.

In a script / in your notebook : You guessed it - implement this function in a suitable script as well. Print your script out using Notepad++ and stick it in your notebooks so you have always got a copy of it.

OUTCOME : A script containing a verification check.


Exception handling

Exception handling in Python is used to prevent runtime errors by pre-empting the errors that the user may make and building in error handling routines. The errors which are thrown up by the program during runtime are sometimes called exceptions.  This method is therefore called exception handling.


In python, these helpful red error messages are called traceback errors. Common ones include ...
  • FileNotFoundError
  • KeyboardInterrupt
  • NameError
  • SyntaxError
  • TypeError
  • ValueError
  • ZeroDivisionError

Task 1.3
 Throwing traceback errors


At the prompt / In your notebooks : Find a way of 'throwing' these errors using the Python prompt. Present screenshots and explanations of each traceback error in your notebooks.

You might find the Python concrete exceptions documentation helpful.

OUTCOME : Practical examples and explanations of how these errors can occur.


Handling exception errors

Now we know what these exceptions are, let's develop a way of handling them. It's worth nothing at this stage that ...

Ah, yes - that makes sense!

In practice, we use a programming structure called try ... except where we literally try to carry out a processing task and handle the exception if it occurs. In general, for Python ...

In general, the 'finally' clause is only really necessary in certain circumstances

  • Download and extract the zip file called exception.zip from the lesson resources and extract this to a suitable place in your user area.

  • Right click on the script called singleExceptionError.py and choose 'Edit with IDLE'. Inspect the script and then run it. Try to break the script by typing anything other than a number. Can you break it?

  • Print out the code using Notepad++ to syntax highlight it and stick it in your notebooks. Annotate the script to describe what it does.

  • Right click on the script called multipleExceptionError.py and choose 'Edit with IDLE'. Again, inspect the script and then run it. This script traps both value errors and zero division errors. Again, try to break it - can you?

  • Print out the code using Notepad++ to syntax highlight it and stick it in your notebooks. Annotate the script to describe what it does.
[MOVE 'Try it out' into Task 1.4]

Task 1.4
 What about your own example?

Create a program to calculate the pace per mile for runners at a running club. The program should ask the user to input a distance, then to input the time taken, and will calculate the pace. The program should use exception handling to trap errors in the input, for example, entering a real number or string instead of an integer. The program should loop until acceptable input is provided.

OUTCOME : Script using exception handling


Semantic Errors

With semantic errors, the code seems to run OK but the answer is wrong. Sometimes, the only way to correct a semantically incorrect script is to use a trace table or dry run. We learnt about trace tables and dry runs in a previous topic.

Task 1.5
 Temperature conversion

There is a script in the resource for the lesson called fahrenheitToCelcius.py which is supposed to convert a fahrenheit temperature to a celcius temperature. Unfortunately, the person that wrote this script is a muppet.


In your notebooks : Print out a copy of the original script and your corrected script using Notepad++ and stick it in your notebooks. Highlight the changes you have made with a  highlighter pen .

OUTCOME : Corrected script to convert fahrenheit to celcius, correctly!



Activity 2 Error detection schemes 

OK, so we've looked at the prevention of errors during data entry. What about errors in data transmission and retrieval? As we shall see in a topic soon, data transmission down wires is subject to interference. It doesn't matter how well we validate the entry of the data, if it's subject to interference during transmission from one place to another, it may have errors in it when it arrives.


As the name suggests, error detection schemes only detect errors, they don't correct them. There are some error correction schemes which we don't study at this level (unfortunately) such as Hamming Codes and Gray Codes.

Majority Voting

Sometimes, this error detection scheme is called a repetition code. It is the simplest form of error detection scheme and the most inefficient. This error detection mechanism isn't really that difficult to understand ...



Task 2.1
 Can you help?

Can you explain how the majority voting error checking mechanism works using the example shown above. You might also want to explain what the words 'intererence' means as well!

OUTCOME : Explanation of the majority voting system.


Parity

Parity, derived from the Latin paritas, literally means 'equal'. In mathematics, parity describes the property of a whole number being either ODD or EVEN.

Do you want to see a card trick?

The concept of Parity can be applied to error detection. The Most Significant Bit (MSB) in a bitstream becomes a parity bit and is toggled to enforce either EVEN or ODD parity in that bitstream. 7-bit ASCII codes are often transmitted using 8 bits, with a single parity bit added to help with error detection.

Oh, yes he is!

Task 2.2
 Your own explanation

Try these two exercises to consolidate your knowledge of parity.
  1. In your notebooks : Firstly, using suitable examples, explain how the parity error correction system works to detect single bit errors in data transmission.

  2. In a script : Next, using the Parity Bit Wikipedia article to help you, write a Python script which takes a bit stream of arbitrary length, asks whether you want to use ODD or EVEN parity and then calculates the parity bit. Print out your script for your notebooks using Notepad++ to make it look pretty. 


OUTCOME : Demonstration of a deeper understanding of the parity system.


Check Digits

A check digit is a single digit or character added to the end of a data stream which is calculated from the other values in the data stream using one of a number of algorithms. The device receiving the data recalculates the check digit and compares it with the check digit it has received. If it's the same, great, otherwise the data stream is rejected.

Check digits are designed to detect simple error like ...
  • single digit errors, such as 1 → 2
  • transposition errors, such as 12 → 21
  • twin errors, such as 11 → 22
  • jump transpositions errors, such as 132 → 231
  • jump twin errors, such as 131 → 232
  • phonetic errors, such as 60 → 16 ("sixty" to "sixteen")
Check digits are used in machine readable artefacts like barcode representations of European Article Numbers (EAN-13), International Standard Book Numbers (ISBN-13) and Universal Product Codes (UPC-A). Take a close look at the following barcodes ...

https://drive.google.com/file/d/0B83yXMOilskadHRPVEZ2UnRLSzA/view?usp=drive_web
Click to enlarge

Actually, the barcodes are for genuine products. Visit the UPC Item Database to find out what they are and tell your teacher when you find out!

Task 2.3
 Check Digits

STAGE ONE
  • Choose a book from the shelf in your classroom. Look on the back for the ISBN number. If it's a fairly modern book, it should have an ISBN-13 code. If it doesn't, choose another book.

  • Now download and print a copy of Magic ISBN 13 Checker.pdf from the lesson resources and use it to calculate the check digit for the book.

STAGE TWO
  • Visit the examples section of the Wikipedia page for Check Digits. You will see algorithms for the calculation of a check digit for UPC-A, ISBN-10, ISBN-13 and EAN-13.

  • Print out a copy of the barcodes from the image above and use the algorithms to calculate the value for the check digits. Remember - when you are calculating the check digits, don't include the check digit in the calculation (obviously).

  • In your notebook : Document what you have done and the calculations you have performed.


OUTCOME : Examples of check digit calculations



Activity 3 Checksums A Level Only

The parity and check digits we have met are special cases of checksums but are only suitable for application to small amounts of data. If we want to determine the integrity of larger quantities of data such as downloaded software files etc, we have to use more complex algorithms.

The algorithms used to generate checksums are called hashing algorithmsThe most common hashing algorithms in use are MD5, SHA1 and SHA2 (SHA256 and SHA512). A hashing algorithm takes data of variable length and produces a fixed length hash value from it. When hash values are used to test the integrity of data, they are called checksums.

Versions of the latest release of Python together with their MD5 Checksums

The checksum is used to determine the integrity of a file downloaded from a server. The algorithms are so clever that even one bit of difference between two files will result in a completely different checksum. The chances of two different files having the same checksum (a collision) are virtually zero and therefore, if you compare the checksum listed on the website / distributed with the file with one that you calculate yourself, you can be pretty sure that nothing has changed!

https://drive.google.com/file/d/0B83yXMOilskadEhDaEdoZWNlanM/view?usp=drive_web
Click to enlarge

Hashing Algorithms and Security - Computerphile (8:11)

Task 3.1
 Checksum Calculator

Most, if not all, programming languages have built in checksum / hash algorithms. The server-side scripting language PHP is no exception and the following resource uses a combination of HTML, CSS, Javascript, AJAX and PHP to demonstrate some of it's functions.

http://www.molecularmagic.net/checksum/
Click to visit the site

STAGE ONE

Use the 'String checksum' section to investigate the behaviour of these hash algorithms when fed a simple text string. Notice that no matter how long the string is (even zero characters long), the checksums are always the same length.

In your notebooks : Record the output of the script using two strings which only differ by one character.

STAGE TWO

Create a simple text file in your user area containing only your first name. Upload your text file using the 'File checksum' section of the page, but be aware that the script will only allow the upload of text files smaller than 1kB in size (to protect my server!) Make a note of the checksum values generated. Now change the text file by altering just one letter, re-save and upload it again. Make a note of the checksum values now.

In your notebooks : Using this example, explain how checksums are used to verify file integrity.

OUTCOME : Understanding of the behaviour of hashing algorithms and the use for checksums.

Passwords

One really common place where checksums are used is in the validation of passwords. As a general rule, if you run a website, you should never store passwords in plain text ...

She's not happy!

  • Download the zip file from the lesson resources called login.zip, extract the folder to a suitable place in your user area. Open the Python file by right clicking on it and choose 'Edit with IDLE' and open the CSV file using a text editor like Notepad++ (try not to use Excel or another spreadsheet application).

  • Print out the Python script using Notepad++ and stick this in your notebooks. The script uses the hashlib library which provides an MD5 hashing algorithm.

  • Look carefully at the CSV file. You'll notice that it has two values on each row; a username and an MD5 hash. There are no plain text passwords are stored in this file.

  • Try running the script and logging in as mmills with password 'password' - easy! Now try logging in as any of the other users - I bet you can't guess the passwords, even though you can see the hash values in the CSV file!

  • At the prompt : You can add your own password to this file using the following method ...

    >>> import hashlib
    >>> password = hashlib.md5()
    >>> password.update(b'secret')
    >>> password.hexdigest()

    ... obviously replacing secret with your preferred password. Add the hexdigest to the CSV file along with your username and try logging in.

  • Of course, there is nothing stopping you changing the hash value in the CSV file for any user to one that you know hashes a known password and breaking in that way ...


Task 3.2
 Plain text passwords

In your notebooks : Using the knowledge you have gained from your exploration of the simple login script, and any experience you have had of ...


... describe the reasons for storing passwords in hashed form and why you should be wary of any website that sends you your username and password in plain text.

OUTCOME : A paragraph in your notebooks about password security.


Extension Activities 

How about these?
  • What about authentication?

  • There are more complex error detection and correction methods called Hamming codes and Gray Codes. Research how these work.

  • Download and complete the worksheet Validation Check Matchup.docx from the lesson resources.

  • I've included the PHP scripts for my Checksum / hashing demonstration in the lesson resources if you fancy implementing them yourself. Download checksum.zip, extract to your webserver root and away you go!

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.


END OF TOPIC ASSESSMENT

By the way, the passwords you couldn't guess are 'sausage', 'trees', 'greatness' and 'toytrains'