CS26 : Don't reinvent the wheel!


With so many problems and so many solutions, there's bound to be an algorithm around which will solve yours. The most common problems to solve involve searching and sorting, but also record storage and route finding.

We are learning ...
  • About the breadth of standard algorithms
  • About standard searching algorithms
  • About standard sorting algorithms
So that we can ...
  • Show appreciation of the fact that there are lots of standard algorithms already written and that each particular situation in terms of data can affect the most suitable choice to solve a problem
  • Describe, trace and implement standard searching algorithms :
    - Linear search
    - Binary search (iterative and recursive)
  • Describe, trace and implement standard sorting algorithms :
    - Bubble sort
    - Insertion sort
    - Merge sort
    - Quicksort
  • Describe uses of hashing
    - Design hashing tables
    - Apply simple hashing algorithms
    - Handle collisions

Activity 1 Standard algorithms 

Life is full of accepted ways of working, practices which are proven to work in most, if not all, situations.

We've got wheel sorted!

Programming is no different. There are a raft of 'standard algorithms' available if you choose to look in the right places and, from a programmers perspective, this is great because it means that we never have to 'reinvent the wheel'. We might, however, still need to assess the suitability of the 'wheel' before we use it and make some modifications, but certainly not 'reinvent' it.


Task 1.1
 Standard Algorithms

As this section is about searching, sorting and hashing algorithms let's compile a list of standard ones from RosettaCode and the good 'ole 'List of algorithms' Wikipedia article.

http://rosettacode.org/wiki/main_page  https://drive.google.com/file/d/0B83yXMOilskaanlRN2YxRkl3N1k/view?usp=drive_web
In your notebooks : From the two websites, write a list of 5 of each of the following standard algorithms. Only choose ones which you have already heard of (unless you haven't heard of any, then you'll have to choose ones you haven't heard of, obviously).
  • Searching
  • Sorting
  • Hashing
Did anyone see a fish?

Now choose one from each category and print out the pseudocode algorithm that you have found on RosettaCode or on Wikipedia. Stick these in your notebooks.

OUTCOME
: List of a small number of standard algorithms plus pseudocode.



Activity 2 Searching algorithms



Linear search

A linear search is the only option available for searching through an unordered list, since the computer can make no judgements about the position of any item.

https://scratch.mit.edu/projects/152352470/#fullscreen
Have a go at Lions linear search!

The linear search algorithm is trivially easy. Simply start at the top of the list and work your way through one item at a time until you either find the item or you don't. The worst case of a linear search is not finding the item at all.

There are two different ways you could implement a linear search - using repeat ... until or using while. The following algorithms are written in pseudocode, not Python!
  • Using repeat ... until :

    currentElement ← 0
    found ← false
    repeat
      if list[currentElement] = elementSought then
        found ← true
      else
        currentElement ← currentElement + 1
      end if
    until found = true or currentElement > sizeOf(list)

  • Using while :

    currentElement ← 0
    found ← false
    while found = false and currentElement < sizeOf(list)
      if list(currentElement) = elementSought then
        found ← true
      else
        currentElement ← currentElement + 1
      end if

That would be nice ...

Task 2.1
 Linear list programming challenge

I have created a structured Python template called 'linearSearchTemplate.py'. Download this and save it to a suitable place in your documents.

The linearSearch() function is not implemented which prevents the program being very useful to be honest. Your task (if you choose to accept it) is to ...
  • Program the linearSearch() function based on the second algorithm shown above;
  • Give evidence that your program works correctly;
  • Draw a structure chart of the solution in your notebooks;
  • Draw a flowchart to represent the algorithm using yEd or equivalent;
  • Make sure you have copied down the algorithms shown above including the repeat ... until version.
OUTCOME : Completed linear list implementation and structure chart.


Binary search


A binary search, or a 'chop search' as it's sometimes called, is a fast an efficient way of searching for an item in an ordered list of values. Yes, you heard it - an ordered list of values. You can't use a binary search on an unordered list. Strategically, the search involves successive halving of the list, discarding the remaining half if the search term cannot reside in it. Eventually, you will either find the item you are looking for or you will run out of items to look at.

https://scratch.mit.edu/projects/152358421/#fullscreen
Have a go at Ducky's binary search!

Scratch is all very well and good, but there are two ways of constructing a binary search in Python ... 
  • Iterative method

    As the name suggests, this method using iteration to search through the ordered list. Read through the program carefully and compare it to Ducky's binary search script before you move on to the task.

    def binarySearchI(searchTerm,items):
      bottom = 0
      top = len(items)-1
      while True:
        currentItem = (top+bottom)//2 # find middle item
        if bottom > top:
          return False
        elif items[currentItem] == searchTerm:
          return currentItem
        elif items[currentItem] < searchTerm:
          bottom = currentItem+1
        else:
          top = currentItem-1


  • Recursive method.

    The recursive method uses the feature of a binary search that the portion of the list which remains after the 'chop' is, itself, a list. It is a little shorter than the iterative solution but more resource heavy and probably slightly slower in execution but doesn't use a loop.

    def binarySearchR(searchTerm,items,top=None,bottom=0):
      if top
    == None: top = len(items)-1 # first call
     
    currentItem = (top+bottom)//2 # find middle item
      if bottom
    > top:
        return False
      elif items[currentItem] == searchTerm:
        return currentItem
      elif items[currentItem] < searchTerm:
        return binarySearchR(searchTerm,items,top,currentItem+1)
      else:
        return binarySearchR(searchTerm,items,currentItem-1,bottom)

But remember, the list has to be sorted first ...

Task 2.2
 Binary search programming challenge

Complete the following activities to demonstrate your understanding of binary search
  • In a script : Your challenge (it's not really that difficult) is to implement the two functions for the iterative and the recursive (A Level Only) binary search in Python. Print out your scripts for your notebooks.

  • At the prompt : Test your functions using the following code at the prompt. Screenshot the output and stick it in your notebooks.

    For the iterative binary search ...

    >>> for i in range(1,12):
          print(i,binarySearchI(i,[1,2,5,6,8,9,10]))

    ... and for the recursive binary search (A Level Only) ...

    >>> for i in range(1,12):
          print(i,binarySearchR(i,[1,2,5,6,8,9,10]))

  • In a flowchart : Create a flowchart to represent the iterative solution only using yEd or equivalent.
OUTCOME : Implemented binary search algorithms, tests and a flowchart


https://scratch.mit.edu/projects/152313149/#fullscreen
Binary search guessing game - play it!

Activity 3 Sorting algorithms

Sorting out sorting (30:56) - do you have the time?

Fruit!

A 'good' sorting algorithm will have the following properties ...
  • Doesn't swap items if they have the same value.
  • Doesn't use any extra storage whilst it is operating - sorts the list 'in place'.
  • Makes less comparisons than there are items in the list.
  • Makes no more swaps than there items in the list.
  • Speeds up when the data is nearly sorted or when there are few unique values.

Unfortunately, there is no single algorithm with all of these properties.
The choice remains dependant on the nature of the data to be sorted.

As you can image, there are a heap of different sorting algorithms available. We only have the time / space to study a few of them but that doesn't stop you from investigating more if you want to (and I would encourage you to).

Get your dancing
shoes on!

Bubble sort

The most common (but not the simplest) sorting algorithm is the Bubble sort, so called because the heavier items 'bubble' gradually to the top of the list during sorting. Before we get into programming this, let's watch two videos ...

Bubble-sort with Hungarian ("Csángó") folk dance (5:15)

Bubble Sort (5:44)

Ask your teacher for some plastic cups and have a go!

Task 3.1
 Bubble sort programming task and flowchart


Carry out the following tasks to demonstrate your understanding of the bubble sort algorithm.
  • Download and print out the 'Bubble sort algorithm.docx' information sheet. Read it carefully.

  • Download the Python template 'bubbleSortTemplate.py'.

    Use this template and the information sheet to complete a programmed solution for a bubble sort. Print out the script (using Notepad++) and evidence of it working for your notebooks.

  • Draw a flowchart using yEd to demonstrate your understanding of the Bubble sort algorithm.

  • Ask your teacher for another Python script called 'bubbleSortComparison.py' (not available to download) which implements both the inefficient and the efficient version of the bubble sort. Compare your solution to this script to make sure you have done it right. Run the script and consider the information provided to you.

    In your notebooks / folders : Write about what you have discovered in your notebook.
OUTCOME : Implemented bubble sort programming and flowchart.


Insertion sort

Depending on the nature of the list, the bubble sort is actually a very poor algorithm. It is, however, one of the most simple to implement which is why it is usually one of the first ones to be studied. The insertion sort is another type of sorting algorithm that works differently, but still very logically and, in some situations, performs better than a bubble sort algorithm would. Again, watch the videos before you attempt the task which follows ...

Insert-sort with Romanian folk dance (4:03)

Insertion sort (9:44)

Ask your teacher for some plastic cups and have a go!

Task 3.2 Insertion sort



Download the Python file 'insertionSort.py'. I have implemented an insertion sort as a function which, due to way that Python works, sorts the list 'in place'.
  • Use the function insertionSort(list) to sort the list [18,5,32,22,8,10,2,19] into numerical order and convince yourself that it works correctly.

  • Print out the script for your notes using Notepad++. Make sure that you print it out nice and big!

  • Trace the operation of the script and produce a trace table with the following headings ...


    ... ask your teacher for help if you do not know how to do this!
OUTCOME : Insertion sort program and trace table.


Merge sort

Both the merge sort and the quick sort which we will learn about in a moment are examples of the use of a divide and conquer strategy to improve the performance of a sorting algorithm.

Both sorting algorithms are implemented recursively (hence why they are only studied at A Level) due to the fact that they both divide the list and treat the portions of the list as separate lists in a future sort operation.

Focusing on the merge sort first, watch the two videos ...

Merge-sort with Transylvanian-saxon (German) folk dance (4:16)

Merge sort (7:44)

Ask your teacher for some plastic cups and have a go!

Task 3.3 Implementation - are you ready?

The 'theory' of the merge sort is one thing but implementing it is definitely another! I'm afraid, there's another video to watch for this one but, whilst you are watching it, I would like you to view my implementation alongside it.


  • Download 'mergesortVerbose.py' and open it up in Python IDLE.

  • Print out the script for your notebooks using Notepad++. Try to print it out in colour if you can so the comments are really clear for you to read.


  • Try using the mergeSort(items) function to sort the list [18,5,32,22,8,10,2,19] into numerical order to convince yourself that it works correctly. You will notice that the script has a very 'verbose' output ... look carefully at what it is doing. Can you compare this to the 'theoretical' algorithm that you viewed in the CS50 video before? Is it the same?

  • Now watch the following video. View it alongside the script that I have given you. I would like to hear some 'confirmatory' noises as you work ...

    Merge sort algorithm (18:09)

  • Rework the practical demonstration you (should) have done before the tasks using the plastic cups. Demonstrate this to another student in the class.

  • Now you can download 'mergesort.py' (the version without verbosity and comments) and print this out for your notebooks as well. Boom!
https://drive.google.com/file/d/0B83yXMOilskaMTkwZGR0OXpQUG8/view?usp=drive_web
Click to enlarge

OUTCOME : Printout of a merge sort implementation in your notebooks.


Quicksort

Now it's getting really interesting. The Quicksort is probably one of the most challenging of the common sorting algorithms to 'get your head around'. As I said earlier, Quicksort uses a strategy called 'divide and conquer' like the Mergesort algorithm and therefore is implemented using recursion.

Once again, to complete the set, here are a number of videos for you to watch.

Quick-sort with Hungarian (Küküllőmenti legényes) folk dance (6:54)

Quicksort (6:08)

Ask your teacher for some plastic cups and have a go!

Task 3.4 Quicksort

Once again, implementation is often different than it looks in theory. The concept of choosing a pivot and recursively sorting the left and right sublists is crucial in understanding how the algorithm works.


  • Download 'quicksortVerbose.py' and open it up in Python IDLE.

  • Print out the script for your notebooks using Notepad++ and make sure you print it in colour and make the font big enough so that you can read the comments in the script.

  • Use the quicksort(list) function to sort the list [18,5,32,22,8,10,2,19] into order. There will be a lot of output from this script again because I have made it quite verbose. You will need to take your time reading through the output but you still might not understand it. Don't worry ...

  • Watch the following video which is one of the clearest ones I have found for explaining how the quicksort algorithm operates. Keep pausing the video and checking through the code so you can follow how it is constructed ...

    Quicksort algorithm (20:38)

  • Rework the practical demonstration you (should) have done before the tasks using the plastic cups. Demonstrate this to another student in the class.

  • Now you can download 'quicksort.py' (the version without verbosity and comments) and print this out for your notebooks as well. Bang!
https://drive.google.com/file/d/0B83yXMOilskaY0RhdWRSWEpkOFk/view?usp=drive_web
Click to enlarge

OUTCOME : Quicksort algorithm implementation in your notebooks with and without verbosity.



Activity 4 Team Programming Challenge 

As a team, implement a structured program which contains the following options ...

A. Create list from scratch
B. Import list from csv file
C. Create random list
D. Linear search
E. Binary search
F. Bubble sort
G. Merge sort
H. Insertion sort
I. Quick sort
J. Save list
Z. Quit

NEED TO FINISH THIS


Activity 5 Hashing   A Level Only

Checksums

We met hashing algorithms in a previous section (How do I stop things going wrong) in the context of checksums. Basically, feeding a variable length data stream into a hashing algorithm yields a fixed length 'hash' which is unique to the data stream. Comparison of this calculated hash to a published hash can be used to confirm the integrity of the data stream.

Associative Arrays

Hashing functions can also be used to determine an index / position in a fixed length data structure in order to both store and identify the position of a record. Data structures that use this mapping are called associative arrays or, sometimes, dictionaries (you've met these before!) 


The principle is simple. By definition, all records have a unique key which is used to uniquely identify them. If we feed the key through a special hashing function, we can use it to generate an index into which we can use to store the record. The 'space' we store the record is normally called a bucket or a slot.

https://drive.google.com/file/d/0B83yXMOilskaZzA2NDVHU0RIM1E/view?usp=drive_web
Click to enlarge

Searching for the recorded is simply a matter of rehashing the key to find the index and looking directly there! Easy!

In the best case, searches in a hash table have time complexity O(1)

The choice of hashing function often depends on the nature of the key value present in the record to be stored. Purely numerical keys can simply be 'MOD'ded with the size of the data structure to give an index. 

index = H(numerical key)  = key MOD records

More complex keys / alphabetical keys can be passed through a function which generates a numerical index value.

index = H(alphabetic key) = (sum of ASCII codes in key) MOD records

Let's watch another CS50 video ...

Hash tables (7:41)


Handling collisions

As you saw from the video, it's quite possible (especially if the data structure is small) that collisions can occur where two keys hash to the same value and therefore the algorithm will aim to store them at the same index. There are two common ways of dealing with collisions - so called closed hashing and open hashing, although the video calls them something else ...
  • Linear probing > Closed hashing (Open addressing)
    If a collision occurs, store the record in the 'next available' slot. The next slot address is calculated using a rehash where 1 is added to the original slot address repeatedly until a suitable slot is found in the table. Generally, the rehash will be 'MOD'ded with the table size so that there is the facility for 'wrap around'. Closed hashing increases the chance of subsequent collisions ('clustering').

    https://drive.google.com/file/d/0B83yXMOilskaY0JfdEIzZnhMSFk/view?usp=drive_web
    Closed hashing - the next available slot is used

  • Separate chaining > Open hashing (Closed addressing)
    If a collision occurs, store the record in a linked list starting at the slot in the table.

    https://drive.google.com/file/d/0B83yXMOilskaUWNEdFROWWE1UnM/view?usp=drive_web
    Open hashing - records are stored in a linked list

Task 5.1 Hashing

Inserting values

There are two 'practical' exercises on hashing to have a go at, each uses a different hashing method and introduces you to both common ways of handling collisions.
  • Download the 'Hashing exercise - numerical keys' word document and save it in a suitable place in your documents folder. Print the sheet out to complete it rather than completing it on screen. There are full instructions on the sheet to help you.

  • Now, download the 'Hashing exercise - alphanumeric keys' word document and save it in a suitable place in your documents. Print the sheet out to complete it rather than completing it on screen. Again, there are full instructions on the sheet to help you. If you need help, you can download a copy of the 'ASCII Code Table' document to help you as well.

Searching for values

When you search for a record in the hash table, you first rehash the key and then look in the slot indicated. If the record is there you're good! If you don't immediately find the record you are looking for, you proceed in a different way depending on whether you used closed or open hashing ...
  • If you used closed hashing, keep 'rehashing' by adding 1 to the slot index until you either ...

    a) find the record you are looking for (hurrah!),
    b) find a blank space (boo!) or
    c) wrap around and get to one less than the original hash value (boo!).

  • If you used open hashing, you keep following the pointers in the linked list until you either ...
     
    a) find the record you are looking for (hurrah!) or 
    b) reach a NULL pointer (boo!).
However, rather than searching for specific values, I would like you to perform searches for every value on the original list for just one type of hashing, either numerical or alphanumeric keys - I don't mind which.

Search for every record and record the total number of comparisons which you have to perform in order to locate each one first in a closed hashing situation and then in an open hashing situation.


Deleting values

One problem that can occur with deletion is that during a subsequent search, if the value that has been deleted is present in the 'collision chain', the search will terminate early as it will find a blank slot. We can overcome this problem by marking a deleted record <DUMMY> to indicate that it should still be traversed during a search but can be reused on subsequent insertion.

In order to learn how the deletion operation works, I've created an exercise on the worksheet called 'Hash table operations.docx' which you should download, print and complete.

Native Python Implementation


In a script : Now, the challenge is to write a Python class which implements a hash table! You should be able to initialise a hash table object with a given length (cause it's fixed) and indicate whether to use open or closed hashing. The default method should be closed hashing. The object should implement three methods, insert, search and delete. I will allow you to cheat for the open hashing implementation and simply use a standard Python list instead of a linked list (unless you fancy a challenge).

Print out your script and evidence of it working. You could use the example on the previous worksheet to help you if you wish to be consistent.

OUTCOME : Three completed worksheets and, possibly, a native Python hash table.



Task 5.2 Python dictionaries are hash tables!

Optional activity - this is a great application of hash tables but might be too much for the faint hearted!

Python dictionaries are implemented using hash tables. Grab yourself a cup of tea, settle yourself on the sofa and watch the following obscure video from PyCon 2010 which is the only useful piece of information I can find online which nicely, and relevantly, explains how Python dictionaries work (they actually use linear probing / closed hashing which surprised me!)

PyCon 2010: The Mighty Dictionary (30:49)



In your notebooks, you could write some notes on how Python uses hash tables to implement an associative array or dictionary? If you are interested, here is a link to the resources that Brandon used in the talk.

OUTCOME : Notes about the implementation of Python dictionaries.


Extension Activities 

How about these?
  • Investigate some other sorting algorithms - use the videos in the resources folder and also the weblinks to do some research.  Under what conditions would a particular sorting algorithm be more useful?

  • Try developing some playing card based sorting games for use in class which help students to develop an understanding of the ways in which these algorithms run.

  • Please watch the rest of the AlgoRhythmics videos on their channel on YouTube!

  • Generate a trace table for a bubble sort - work out your own headings.

  • Create animations of all the searching and sorting algorithms that you have studied.

  • I've also given you another quicksort script called 'betterQuicksort.py'. Download this baby and look at how cool it is! The reason I've not used this in the lesson is because it uses a slightly different algorithm than the 'common' one of choosing the pivot at the extreme right or left. It chooses a central pivot a bit like a binary sort would. Still recursive however ...

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.

END OF TOPIC ASSESSMENT