CS08 : I am not a number


With some bizarre reference to the 1960's TV series 'The Prisoner', we enter the world of character encoding.

We are learning ...
  • About the ways in which computers can represent characters
So that we can ...
  • Differentiate between the character code representation of a decimal digit and it's pure binary representation
  • Describe ASCII and Unicode systems for coding character data
  • Explain why Unicode was introduced
  • Use relational operators with characters
  • Have practical experience in the use of string manipulation functions in programming

Computers are great at binary. That's it - they are great at binary. They can do binary in their sleep, binary is their 'thing'. Ask a computer to do some binary and it'll say "Yep - I can do that". Ask a computer to do 'characters' or 'letters' and it'll say "Huh?" (but in binary, of course).

It's up to us, the humans, to design encoding systems which allow computer to represent characters and letters in binary. It doesn't mean that the computer actually understands it; it still needs a human to interpret the codes.

Activity 1 Baudot code (55)

Character encoding is not a new thing. Baudot code is a coding system originally developed by Emile Baudot in 1874 for use on teleprinters to send messages along the reasonably recent international telegraph systems. The voice telephone was only invented in the late 1870s and was not widely adopted for communication until the 20th century.  


Task 1.1 Baudot Code
Resources

Research Baudot code on the internet and find out ...
  • when it was developed;
  • what it was used for;
  • who used it;
Do all Baudot representations work the same way? Find two different code representations from the web and compare them. Write your findings in your assessment book and give the website references for what you have found and the date you accessed them.

https://drive.google.com/file/d/0B83yXMOilskabmVwMURodVM3V0k/view?usp=drive_web
Click to enlarge

OUTCOME : Summary of Baudot Code.



Task 1.2 Coldcode
Decoding Coldplay's X&Y Album Cover.pdf
Codecode.zip

Read the article Decoding Coldplay's X&Y Album Cover.pdf and then visit http://ditonus.com/ or download and unzip the ‘coldcode.zip’ file from the resources folder for this lesson and have a play with creating your name in Baudot code. Print out what you have done and put it in your assessment book and explain how it works.

OUTCOME : Your name as a Coldplay Album cover plus an explanation of how it works.


Activity 2 Night writing (30)

Night writing, or sonography, was a system of code that used symbols of twelve dots arranged as two columns of six dots embossed on a square of paperboard. It was designed in 1808 by Charles Barbier in response to Napoleon's demand for a code that soldiers could use to communicate silently and without light at night. Each grid of dots stands for a character or phoneme. Unfortunately, the system proved too complicated to use in the field. 

Undeterred, Barbier intended to adapt the system for civilians and presented his system at the Royal Institute for Blind Youths in Paris. The lecture was attended by a 12 year old boy called Louis Braille who later adapted the system for use by blind people. You might want to read this article.


Task 2.1 A pattern in the dots
Post it note
Blutack
Pen

Look at the following Braille alphabet.

  1. Can you see any pattern in the dots? Write down your ideas.
  2. This alphabet is used to represent 26 characters. How many characters could this system represent in total?
  3. Using a piece of paper, blutack and a pen, write a braille message to your friend. How easy is it to read? Stick your effort into your assessment books.
OUTCOME : Answers to question 1 and 2 and a message in Braille (stuck in your book)


Activity 3 Morse code (40)

Morse code has been in use for naval and civilian communication since the mid 1800s. Samuel Morse helped to develop the system of Morse Code for long distance communication ...

“In 1825 New York City had commissioned Morse to paint a portrait of Lafayette, then visiting Washington, DC. While Morse was painting, a horse messenger delivered a letter from his father that read, "Your dear wife is convalescent". The next day he received a letter from his father detailing his wife's sudden death. Morse immediately left Washington for his home at New Haven, leaving the portrait of Lafayette unfinished. By the time he arrived, his wife had already been buried. Heartbroken that for days he was unaware of his wife's failing health and her death, he decided to explore a means of rapid long distance communication.”


Task 3.1 The Morse Code Activities
Web browser

Use the following activities to familiarise yourself with Morse Code and write some notes in your notebooks.
  • Use the following website to practice your Morse Code skills. Write up what you have done in your notebooks.
  • As an extra bit of fun, visit this website and write your own 'secret' morse code message to your friends.
  • Print out the following Morse code table and stick it in your notebooks.
  • Play the following sound. What does this have to do with Morse Code? Write down your ideas in your notebook.

OUTCOME : Description of Morse Code.


Activity 4 No standard for us! (25)

Early computers did not communicate with each other. Therefore, there was no need for any standard system to be used.
Individual manufacturers used their own encoding systems ...
  • BCD (Binary Coded Decimal) was used in calculators and early computers from around 1959 onwards
  • EBCDIC (Extended Binary Coded Decimal Interchange Code) was used for IBM Mainframe computers (1963)

Task 4.1 Early encoding systems
Web browser

Research these two early character encoding systems used by computers. Produce a simple summary of each using mind-mapping software like Freeplane or an online mind mapping service like Bubbl.us (click 'Start Brainstorming >' to create a mindmap without signing in. Consider ...
  • when it was created,
  • why it was created,
  • what it was used for,
  • the main principles of encoding,
  • some examples of data encoded using this method.
Print your mindmap out for your notebooks. 

OUTCOME : Mindmap showing details of BCD and EBCDIC


Activity 5 ASCII (55)

A keyboard is an input device which turns keypresses into binary electrical signals. Clearly, each letter / character / number on the keyboard must produce a different set of binary signals and each signal must be able to be interpreted as a different one by the computer.

Task 5.1 My very own code table!
Computer Keyboard
My very own code table.docx

If we represented all the characters on a computer keyboard using binary, how many different patterns would we need? Consider lower and upper case characters, digits and other symbols. How many bits would you need to use to do this? Which codes would you use for each symbol?


Create your own code table using the handout My very own code table.docx available in the lesson resources. To do this activity properly, you should abandon any previous knowledge you have about ASCII!


OUTCOME : Your own character encoding table for the UK keyboard.



The currently accepted standard for character encoding is ASCII. ASCII stands for American Standard Code for Information Interchange. It was developed as an international standard for character coding. ASCII is a 7 bit code and can therefore be used to represent up to 128 different characters.

Refer to the handout ASCII Coding Table.docx and complete the following task.

Task 5.2 Analysing ASCII
ASCII Coding Table.docx
ASCII Explained (in considerable detail).pdf

Colouring in

On a copy of ASCII coding table.docx, using 4 different coloured pencils, shade the following sections of the character set ...
  • Uppercase and lowercase letters in ORANGE
  • Numbers in BLUE
  • Other characters in GREEN
  • Commands in RED
Patterns in the code

If you look carefully, you should be able to see patterns in the binary codes? For instance ...
  • 0111000 : A code for a picture of a number eight
  • 0001000 : The decimal value for the number 8
Can you see any more patterns in the binary codes? Discuss your ideas with your peers and write down what you have discovered in your notebooks.

History of ASCII

There is a document available called ASCII Explained (in considerable detail).pdf. Try to find out (and write about) the significance of the 'DEL' symbol and why it was used. This document also contains the original specification for ASCII which does make some interesting reading regarding the origins of the system.

OUTCOME : Coloured ASCII code tables, identification of patterns in the binary codes and a written explanation of the significance of the DEL symbol.


Activity 6 Character (and string) functions (45)

String handling in programming languages involves two aspects - the handling of ASCII codes and characters (or Unicode to be precise) and string manipulation. In this section, we will look at handling of ASCII values and characters.

For the following task, create a word processed document with suitable headers and footers. Record what you have done and what you have learnt using screenshots and written explanations. Do not copy and paste code.

Task 6.1 Demonstrating that you get it
ASCII Coding Table.docx

  • Open up the Python programming environment, IDLE.

  • Converting to and from ASCII

    Investigate the chr() and ord() functions by typing in the following commands at the prompt, pressing the
     ENTER  key after each one.

    chr(78)
    ord("N")

    Can you explain what is happening? Write your ideas in your word processed document.

  • Converting ASCII code to pure integer (the hard way)
     
    We can specify pure binary numbers using binary literal format where you prefix the binary number with 0b or 0B. We can also convert numbers to binary using the bin() function or use the string format functions to force a binary string in the output. Type the following commands at the prompt, pressing the 
     ENTER  key after each one.

    0b11001100
    0B00110011
    bin(56)
    bin(145)
    "{0:07b}".format(49)
    "{0:07b}".format(0b1001)

    Notice that the bin() function only generates as many binary digits as it needs to represent the denary number whereas the string.format() method generates a suitably padded binary string. So, what has this got to do with ASCII?

     Look carefully at the following table ...

    Digit :  0 1 2 3 4 5 6 7 8 9
    ASCII :   0110000   0110001   0110010   0110011   0110100   0110101   0110110   0110111   0111000   0111001 
    Binary :   0000000   0000001   0000010   0000011   0000100   0000101   0000110 
     0000111   0001000   0001001 

    Can you see that the four least significant digits in the ASCII codes for the numerals 0 - 9 represent the denary values? We can use masks to convert binary ASCII codes into their corresponding integers. You can apply a mask using either an AND (&) operation (to unset bits) or an OR operation (|) (to set bits) depending on what you want to achieve. In this case, we only want to allow the four least significant bits to appear in our masked value by unsetting the most significant three, so our AND binary mask will be ...

    - ASCII code to denary : code & 0b0001111

    Type the following commands at the prompt, pressing the
     ENTER  key after each one.

    "{0:07b}".format(ord("5"))
    "{0:07b}".format(ord("5") & 0b0001111)
    int(ord("5"))
    int(ord("5") & 0b0001111)
    int(ord("5") & 15)

    Can you explain what is happening? Write your ideas in your word processed document.


  • Converting characters into upper and lowercase (the easy and the hard way)

    First, we'll look again at the straightforward method using string modifiers. Type the following commands at the prompt, pressing the
     ENTER  key after each one.

    message = "HeLlO aNd WeLcOmE"
    message.upper()
    message.lower()

    Explain what you have discovered in your word processed document. Easy. I realise that we have used strings in this example rather than single characters - I just thought that the example would be a bit clearer that way :)

    So, how about making it a little harder? After all, this is A Level! If you look closely the ASCII codes for the capital letters start at 1000001 for 'A' through to 1011010 for 'Z' and the lower case letters start at 1100001 for 'a' through to 1111010 for 'z'. The only difference in the codes is the presence of the 6th bit in the lowercase letters. If we flip this bit, we can convert from upper to lowercase; we can use an 'OR' mask to convert uppercase into lowercase (by setting the 6th bit) and an 'AND' mask to convert lowercase to uppercase (by unsetting the 6th bit) :

    - Uppercase to Lowercase : code | 0b0100000
    - Lowercase to Uppercase : code & 0b1011111

    Type the following commands at the prompt, pressing the
     ENTER  key after each one.

    "{0:07b}".format(ord("h"))
    "{0:07b}".format(ord("h") & 0b1011111)
    "{0:07b}".format(ord("H"))
    "{0:07b}".format(ord("H") | 0b0100000)
    chr(ord("h") & 0b1011111)
    chr(ord("h") & 95)
    chr(ord("H") | 0b0100000)
    chr(ord("H") | 32)

    Can you explain what is happening? Write down your ideas in your word processed document.


  • Checking the identity of a character

    OK - a bit of light relief. These Python functions are easy and will work on either single characters or strings. 

    Use string.isdigit() to check for digit 0-9
    Use string.isalpha() to check letters
    Use string.isalnum()
    to check for letters and / or digits

    Your job is to come up with some examples to show how these work in practice. Document what you have done and what you have learnt in your word processed document.


Now print out your word processed document and stick it in your notebook.

OUTCOME : Word processed document which explains how character (string) functions operate.


Activity 7 Character codes are ordinal (50)

Which is 'bigger'? A or Z? Clearly, the letters themselves have no greater or lesser significance so how can they be 'orderable'? Since characters in any character set are represented by binary (integer) codes, you can order the characters based on their character code. For example ...


... only works because the ASCII code of A is 65 and the ASCII code of Z is 90. See?

Task 7.1 Orders please!
ASCII Code Table.docx
Notebook

Decide whether the following comparisons are True or False. Hint : Use the ASCII Code Table!
  • "G" > "F"
  • "6" > "9"
  • "[" > "]"
  • ":" < "{"
  • "<" <= ">"
Can you come up with three more to test your peers?


OUTCOME : Statements of ordinality.


It is possible to use the ordinal nature of characters to perform calculations. For instance, I have to calculate the progress of my GCSE students based on the number of grades progress they have made. Their target grade represents 3 levels (or grades) progress from their intake score at KS2.

For instance, if a student is targetted a grade B but achieved an A, this represents 2 levels of progress. We can calculate his progress using ...

def progress(target,grade):
# Calculate progress. Assume target is 3 LoP
# Does not handle A* or U!
    progress = ord(target)-ord(grade)+3
    return progress

Task 7.2 Python programming, innit.
IDLE (Python)

Implement this function into a suitable program which asks for the grade and the target and outputs the levels of progress. Then, if you fancy a challenge ...
  1. Alter the function so that you have to specify the target LoP (assumed in this example to be 3)
  2. Implement 'A*' and 'U' grade handling
Provide evidence of what you have done as a suitably formatted code listing and screenshots of input and output. There is no need to explain how the code operates as long as you can provide suitable evidence of testing.

OUTCOME : Evidence that you have implemented the progress calculator function, plus extensions if you can.


Activity 8 Fonts (15)

A character code does not tell us anything about the appearance of the characters. Character codes are used to enable the representation of pictures of letters, numbers, characters and symbols. The pictures can change but the code remains the same. We use fonts to represent the pictures of the characters in different ways.


Have you ever been foolish enough to try to send a 'secret' email using Windings? Here's the thing – it's not secret! Just because you change the font, doesn't change the character code! Character code 078 is still an 'N' even if it looks like a skull and crossbones!

Task 8.1 Not so secret!
Resources

Decode the following secret message and write the answer in your notebooks. (You might need to use Character Map or a Word document > Insert Symbol to help you!)

OUTCOME : Decoded message

No Checkpoint - it's not really worth it is it?

Activity 9 Unicode (25)

Since most people don't want to be restricted to the limited character set that ASCII provides, first Extended ASCII and then Unicode (Universal character cod(e)ing system) was developed. Unicode is now an international standard for consistent encoding, representation and handling of text expressed in most of the world's writing systems.

Task 9.1 Unicode charts galore!
Web browser

Explore the character coding tables at http://unicode.org/charts/ which are all PDF documents containing the full character sets then answer the following questions in your notebooks.
  • Each unicode character is represented by a 4 hex digit code. How many bits does this represent?
  • How many characters can be represented in each coding table?
  • Investigate BabelMap (also available to download from the lesson resources) - software for inspection of Unicode character sets.
  • Why is Unicode so much better than ASCII?
OUTCOME : Answers to questions about Unicode.




Extension Activities 

How about these ...
  • ASCII Art

    One of the earliest examples of ASCII art was from the 1920's - long before the advent of computers! It was done using a typewriter. You might want to read more about this from this website.

    https://drive.google.com/file/d/0B83yXMOilskaSUNGd2RiUmljZzg/view?usp=drive_web

  • Programming Challenge : ASCII to String

    Write a program to take a sequence of ASCII codes and use them to form a character string, which is then displayed. Below is a sample program run.


    To help you, the structured English for this problem might look like this ...

    get list of ASCII codes and construct a list
    loop through each item in the list, convert to a character and append to string
    display string

    As a reminder, the Python command for converting an ASCII code to a character is chr(code)

  • Programming challenge : String to ASCII

    Create a new program that will input a short message as a character string then output a sequence of ASCII codes as in the following example.


    Again, to help you, here is some structured English ...

    get string from user
    construct a list of ASCII codes from each character in the string
    loop through the list and display each code

    The Python command for converting a character into an integer ASCII code is ord(character) in case you didn't already know :)

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.

END OF TOPIC ASSESSMENT