With so many problems and so many solutions, there's bound to be an algorithm around which will solve yours. The most common problems to solve involve searching and sorting, but also record storage and route finding. We are learning ...- About the breadth of standard algorithms
- About standard searching algorithms
- About standard sorting algorithms
So that we can ...- Show appreciation of the fact that there are lots of standard algorithms already written and that each particular situation in terms of data can affect the most suitable choice to solve a problem
- Describe, trace and implement standard searching algorithms :
- Linear search - Binary search (iterative and recursive)
- Describe, trace and implement standard sorting algorithms :
- Bubble sort - Insertion sort - Merge sort - Quicksort - Describe uses of hashing
- Design hashing tables - Apply simple hashing algorithms - Handle collisions
Life is full of accepted ways of working, practices which are proven to work in most, if not all, situations.
We've got wheel sorted!
Programming is no different. There are a raft of 'standard algorithms' available if you choose to look in the right places and, from a programmers perspective, this is great because it means that we never have to 'reinvent the wheel'. We might, however, still need to assess the suitability of the 'wheel' before we use it and make some modifications, but certainly not 'reinvent' it.
Linear searchA linear search is the only option available for searching through an
unordered list, since the computer can make no judgements about the position of any item.The linear search algorithm is trivially easy. Simply start at the top of the list and work your way through one item at a time until you either find the item or you don't. The worst case of a linear search is not finding the item at all.
There are two different ways you could implement a linear search - using
`repeat ... until` or using `while` . The following algorithms are written in pseudocode, not Python!- Using
`repeat ... until` :
`currentElement ← 0` `found ← false` `repeat` `if list[currentElement] = elementSought` `then` `found ← true` `else` `currentElement ← currentElement + 1` `end if` `until found = true or currentElement > sizeOf(list)` - Using
`while` :`currentElement ← 0` `found ← false` `while found = false and currentElement < sizeOf(list)` `if list(currentElement) = elementSought` `then` `found ← true` `else` `currentElement ← currentElement + 1` `end if`
That would be nice ...
Binary searchA binary search, or a 'chop search' as it's sometimes called, is a fast an efficient way of searching for an item in an ordered list of values. Yes, you heard it - an
ordered list of values. You can't use a binary search on an unordered list. Strategically, the search involves successive halving of the list, discarding the remaining half if the search term cannot reside in it. Eventually, you will either find the item you are looking for or you will run out of items to look at.Scratch is all very well and good, but there are two ways of constructing a binary search in Python ...
__Iterative method__
As the name suggests, this method using*iteration*to search through the ordered list. Read through the program carefully and compare it to Ducky's binary search script before you move on to the task.
`def` `binarySearchI` `(` `searchTerm` `,` `items` `):`
`bottom` `=` `0`
`top` `=` `len` `(` `items` `)-` `1`
`while` `True` `:`
`currentItem` `= (` `top` `+` `bottom` `)//` `2` `# find middle item`
`if` `bottom` `>` `top` `:`
`return` `False`
`elif` `items` `[` `currentItem` `] ==` `searchTerm` `:`
`return` `currentItem`
`elif` `items` `[` `currentItem` `] <` `searchTerm` `:`
`bottom` `=` `currentItem` `+` `1`
`else` `:`
`top` `=` `currentItem` `-` `1`
__Recursive____method__.
The recursive method uses the feature of a binary search that the portion of the list which remains after the 'chop' is, itself, a list. It is a little shorter than the iterative solution but more resource heavy and probably slightly slower in execution but*doesn't*use a loop.
`def` `binarySearchR` `(` `searchTerm` `,` `items` `,` `top` `=` `None` `,` `bottom` `=` `0` `):`
`if` `top` `==` `None` `:` `top` `=` `len` `(` `items` `)-` `1` `# first call`
`currentItem` `= (` `top` `+` `bottom` `)//` `2` `# find middle item`
`if` `bottom` `>` `top` `:`
`return` `False`
`elif` `items` `[` `currentItem` `] ==` `searchTerm` `:`
`return` `currentItem`
`elif` `items` `[` `currentItem` `] <` `searchTerm` `:`
`return` `binarySearchR` `(` `searchTerm` `,` `items` `,` `top` `,` `currentItem` `+` `1` `)`
`else` `:`
`return` `binarySearchR` `(` `searchTerm` `,` `items` `,` `currentItem` `-` `1` `,` `bottom` `)`
But remember, the list has to be sorted first ...
Sorting out sorting (30:56) - do you have the time? Fruit!
A 'good' sorting algorithm will have the following properties ... - Doesn't swap items if they have the same value.
- Doesn't use any extra storage whilst it is operating - sorts the list 'in place'.
- Makes less comparisons than there are items in the list.
- Makes no more swaps than there items in the list.
- Speeds up when the data is nearly sorted or when there are few unique values.
Unfortunately, there is no single algorithm with all of these properties.The choice remains dependant on the nature of the data to be sorted.As you can image, there are a heap of different sorting algorithms available. We only have the time / space to study a few of them but that doesn't stop you from investigating more if you want to (and I would encourage you to). Get your dancing shoes on! Bubble sortThe most common (but not the simplest) sorting algorithm is the Bubble sort, so called because the heavier items 'bubble' gradually to the top of the list during sorting. Before we get into programming this, let's watch two videos ...Bubble-sort with Hungarian ("Csángó") folk dance (5:15) Bubble Sort (5:44) Ask your teacher for some plastic cups and have a go!
Insertion sortDepending on the nature of the list, the bubble sort is actually a very poor algorithm. It is, however, one of the most simple to implement which is why it is usually one of the first ones to be studied. The insertion sort is another type of sorting algorithm that works differently, but still very logically and, in some situations, performs better than a bubble sort algorithm would. Again, watch the videos before you attempt the task which follows ... Insert-sort with Romanian folk dance (4:03) Insertion sort (9:44) Ask your teacher for some plastic cups and have a go!
Merge sortBoth the merge sort and the quick sort which we will learn about in a moment are examples of the use of a divide and conquer strategy to improve the performance of a sorting algorithm.Both sorting algorithms are implemented recursively (hence why they are only studied at A Level) due to the fact that they both divide the list and treat the portions of the list as separate lists in a future sort operation.Focusing on the merge sort first, watch the two videos ... Merge-sort with Transylvanian-saxon (German) folk dance (4:16) Merge sort (7:44) Ask your teacher for some plastic cups and have a go!
QuicksortNow it's getting really interesting. The Quicksort is probably one of the most challenging of the common sorting algorithms to 'get your head around'. As I said earlier, Quicksort uses a strategy called 'divide and conquer' like the Mergesort algorithm and therefore is implemented using recursion. Once again, to complete the set, here are a number of videos for you to watch. Quick-sort with Hungarian (Küküllőmenti legényes) folk dance (6:54) Quicksort (6:08) Ask your teacher for some plastic cups and have a go!
As a team, implement a structured program which contains the following options ... A. Create list from scratch B. Import list from csv file C. Create random list D. Linear search E. Binary search F. Bubble sort G. Merge sort H. Insertion sort I. Quick sort J. Save list Z. Quit NEED TO FINISH THIS
ChecksumsWe met hashing algorithms in a previous section (How do I stop things going wrong) in the context of checksums. Basically, feeding a variable length data stream into a hashing algorithm yields a fixed length 'hash' which is unique to the data stream. Comparison of this calculated hash to a published hash can be used to confirm the integrity of the data stream.Associative ArraysHashing functions can also be used to determine an index / position in a fixed length data structure in order to both store and identify the position of a record. Data structures that use this mapping are called associative arrays or, sometimes, dictionaries (you've met these before!) The principle is simple. By definition, all records have a unique key which is used to uniquely identify them. If we feed the key through a special hashing function, we can use it to generate an index into which we can use to store the record. The 'space' we store the record is normally called a bucket or a slot.Searching for the recorded is simply a matter of rehashing the key to find the index and looking directly there! Easy!In the best case, searches in a hash table have time complexity O(1)The choice of hashing function often depends on the nature of the key value present in the record to be stored. Purely numerical keys can simply be 'MOD'ded with the size of the data structure to give an index. index = H(numerical key) = key MOD recordsMore complex keys / alphabetical keys can be passed through a function which generates a numerical index value. index = H(alphabetic key) = (sum of ASCII codes in key) MOD recordsLet's watch another CS50 video ... Hash tables (7:41) Handling collisionsAs you saw from the video, it's quite possible (especially if the data structure is small) that collisions can occur where two keys hash to the same value and therefore the algorithm will aim to store them at the same index. There are two common ways of dealing with collisions - so called closed hashing and open hashing, although the video calls them something else ...**Linear probing**>**Closed hashing**(**Open addressing**) If a collision occurs, store the record in the 'next available' slot. The next slot address is calculated using a*rehash*where 1 is added to the original slot address repeatedly until a suitable slot is found in the table. Generally, the rehash will be 'MOD'ded with the table size so that there is the facility for 'wrap around'. Closed hashing increases the chance of subsequent collisions ('clustering').Closed hashing - the next available slot is used**Separate chaining**>**Open hashing**(**Closed addressing**) If a collision occurs, store the record in a*linked list*starting at the slot in the table.Open hashing - records are stored in a linked list
How about these?
- Investigate some other sorting algorithms - use the videos in the resources folder and also the weblinks to do some research. Under what conditions would a particular sorting algorithm be more useful?
- Try developing some playing card based sorting games for use in class which help students to develop an understanding of the ways in which these algorithms run.
- Please watch the rest of the AlgoRhythmics videos on their channel on YouTube!
- Generate a trace table for a bubble sort - work out your own headings.
- Create animations of all the searching and sorting algorithms that you have studied.
- I've also given you another quicksort script called '
**betterQuicksort.py**'. Download this baby and look at how cool it is! The reason I've not used this in the lesson is because it uses a slightly different algorithm than the 'common' one of choosing the pivot at the extreme right or left. It chooses a central pivot a bit like a binary sort would. Still recursive however ...
END OF TOPIC ASSESSMENT |