CS28 : Who is this little man?


The lowest level language used for communicating an algorithm to a computer is Binary or Machine Code. Humans aren't very good at binary because they are not machines so Computer Scientists developed languages which are close to machine code, but are a little more 'humanlike'.

We are learning ...
  • About machine code programming
So that we can ...
  • State that processors have an instruction set
  • Describe the format of instructions as an opcode and an operand
  • Describe the different addressing modes in use in modern processors
    - Immediate, direct, indirect and indexed addressing
  • Understand simple assembly language instructions and use them to write programs
    - load, add, subtract, store, branching, comparison, bitwise operations, logical shifts, halt
  • Write simple programs in assembly language


The language understood by digital computers is binary, commonly expressed as voltage levels. Typically, 5 volts is used to represent a binary '1' whereas 0 volts is used to represent a binary '0'. Collections of such voltages are used to represent the instructions which give the processor tasks to carry out - after all, we are its master. The voltage collections, representing binary strings, are known as machine codes. This is all the processor understands.

Activity 1 Instruction sets

The actual CPU of any computer system only understands a bespoke set of instructions called an Instruction Set. The instruction set for a processor is the set of binary codes or bit patterns (often represented in hexadecimal) for which machine (CPU) instructions have been defined. Instructions in the instruction set cover basic functions like data transfer, arithmetic operations, logical operation, testing and branching and binary shift operations.


Writing instructions in binary code like this is called Machine Code.


An example of a machine code instruction could be 0001000100001111 which could represent an instruction which adds the literal value 15 to a processor register. Since humans are generally a bit rubbish at binary (we are easily confused) ...
  • The machine code was written in hexadecimal instead, so 01001010 would be written as 4C which does make it a little easier for us to write the instructions, and ...

  • Processor manufacturers invented a mnemonic based, human readable / writeable version of machine code called Assembly Code. However, because the processor can only understand binary, we would need to throw this through a special piece of software called an Assembler to translate it into machine code before we give it to the processor to execute ...

The assembler also checks for syntax errors like any other translator

Modern processors have hugely complicated instruction sets but early processors like the 8-bit MOS 6500 family (which, incidentally gave rise to the home computer revolution) had only (!) 56 instructions.

https://drive.google.com/file/d/0B83yXMOilskaT01GSGxCSlJSR00/view?usp=drive_web
Click to enlarge

Both machine and assembly code are low level languages due to their proximity to the processor. Instruction sets for one processor are incompatible with instruction sets for a different processor. For instance, a machine code program for an x86 processor will not run on a RISC processor and vice versa. One of the earliest computers, the IBM PDP-8 had only 8 instructions in its instruction set!

Task 1.1
 Watch this video

As I said, "Watch this video" ...

What is an assembler (0:42)

As you can see, not very helpful was it? Your job is to create a better video explaining the operation of an assembler. In your video, you should discuss ...
  • Processor instruction sets;
  • The types of instructions generally available in simple processor architectures;
  • Why humans are a bit rubbish and need Assembly code;
  • The role of the assembler in all this.
If you haven't got the skills or creativity or time to create a video, just give me a single slide presentation - it'll do.

OUTCOME : Better explanation of instruction sets, machine code, assembly code and assemblers.


Structure of a machine code instruction

Clearly, this will vary between different machine architectures (obviously) but in general, machine code instructions take the standard form of ...


The opcode or operation code is represented by a mnemonic like LDA (Load Accumulator). For any particular instruction, the processor may require data to operate on which is represented by the operand.
  • 'Zero' address instructions only consist of an opcode and require no operand.
  • 'One' address instructions have an opcode and a 1-byte operand.
  • 'Two' address instructions have an opcode and a 2-byte operand either because the opcode needs two operands or because the operand is too large to fit in one byte.
It's OK to be confused!

Addressing modes

The addressing mode tells the opcode how to treat the operand ...
  • Immediate Addressing
    Indicated in Assembly Code using a # sign before the operand. The value of the operand is taken literally

  • Direct Addressing
    Indicated in Assembly Code by writing just the operand on it's own. The operand gives the memory location containing the data to be read.
     
  • Indirect Addressing
    Usually indicated in assembly code by putting brackets around the operand. The operand give the memory location of a memory location which holds the memory address of the data. There are two more specific types of indirect addressing ...

    Indexed Addressing where the operand gives the base address to which is added the value of a register called the Index Register to get the memory location.

    Base Addressing is the opposite to indexed addressing where the operand gives the offset from the base address which is held in a register called the Base Register. 

  • Relative Addressing
    Used in branch instructions to move the PC relative to it's current value. The operand gives the number of bytes to jump / branch.



For instance, if memory location 99, 110 and 145 contain the values 145, 19 and 256 respectively. Consider the following instructions which are using different addressing modes which are executed one after the other. The Base Register contains the value 99. 

  • LDA #99

    Immediate addressing : Load the accumulator with the value 99. The accumulator will contain 99.

  • ADD 99

    Direct addressing : Add the value stored in memory location 99 to the accumulator. The accumulator will now contain 145 + 99 = 244

  • ADD (99)

    Indirect addressing : Add value stored in the memory location held in address 99 to the accumulator. The accumulator will now contain 244 + 256 = 500

  • STC &11

    Base addressing : Subtract with carry the value stored in memory location 11 offset from the Base Register address from the accumulator value. The accumulator will contain 500 - 19 = 481

Task 1.2 Copy this!

Print out the following diagram and stick it into your notebooks. It describes the structure of a typical 16-bit machine code instruction which includes 'opcode', addressing mode' (AM) and 'operand'.

https://drive.google.com/file/d/0B83yXMOilskabXhhbXNIVkdSckU/view?usp=drive_web
Click to enlarge

In your notebooks : Try to explain how the different Addressing modes work. Use the examples from the notes above to help with your explanation. 

OUTCOME : Diagram of the structure of a machine code instruction plus an explanation of addressing modes.


Activity 2 The Little Man Computer (LMC)

The 'Little Man Computer' was developed by Dr Stuart Madnick in 1965 as a simple way of teaching students about machine code through a computational model of a von Neumann architecture computer.


The concept is based around a little man shut in a mailroom with 100 mailboxes (numbered 0 to 99) at one end, each of which can contain a 3 digit instruction or 3 digit data ranging from 0 to 999. At the other end of the mailroom are an In-basket and an Out-basket through which the little man can communicate with the 'outside world'. The only help the little man has with his calculations is a Calculator which allows him to perform addition and subtraction, storing the results in the calculator display. Finally, there is a resettable counter called the Program Counter which holds the address of the next instruction that the little man will carry out. The Program Counter is automatically incremented by one after each instruction is carried out. 

https://drive.google.com/file/d/0B83yXMOilskaYjhYSkZlUFlvVEk/view?usp=drive_web
The Little Man. Click to enlarge.

The 'Little Man' only understands 10 instructions. This is his limited instruction set. You give the Little Man instructions using simple assembly code. The compiler converts this into machine code (represented in decimal) , stores the instructions in his mailboxes which the Little Man then reads, looks up in his instruction set and executes. The Little Man has a simple calculator and a program counter which always points to the next instruction to be fetched and executed.


The Little Man Computer, although I cannot for the life of me remember where this version came from :(

Download the document Little Man Computer Instruction Set for details of the commands you can give the Little Man Computer. Remember that the LMC runs a reduced instruction set to make him easier to use. This doesn't make him any less powerful, it just means that his programs might be a little longer then an equivalent CPU.

The assembly code instructions for the LMC take a slightly simpler form than standard assembly code : a series of single digit opcodes (with no addressing modes because the LMC only uses direct addressing) followed by two digit operand corresponding to a mailbox address (0-99).


Task 2.1
 Programming the Little Man

Try the following examples in the LMC. Type the assembly code in the 'Instructions' window and click 'Compile and Load Instructions'. The compiler will convert the assembly code into machine code and store it in the memory (blue boxes). Use the instruction set to compare the assembly code to the machine code. The workflow is ...

Enter Assembly Code > Click 'Compile and Load Instructions' > Click 'Run ...'

Exercise 1
Write a program in LMC assembly code which asks the use for a number and displays the same number until the user enters the number '0'. (Click for the solution)

Exercise 2
Write a program in LMC assembly code that asks the user for a number and then counts down from that number to zero, outputting each value in turn. (Click for the solution)

Exercise 3
Write a program in LMC assembly code that takes two numbers from memory, adds them together and stores the answer then displays the answer to the user. (Click for the solution)

OUTCOME : Completed challenges on the LMC


Activity 3 Assembly code programming challenges 

The Little Man is one thing but the instruction set provided for use in examinations is a little bit more complicated. Don't worry, it's not as bad as an Intel x86 processor instruction set but it's not far off!

The commands given are grouped as follows ...
  • Load, store and move instructions
  • Add and subtract instructions
  • Compare and branch instructions
  • Logical operations
  • Shift operations
  • Halt
... and it also supports labels and two different operand addressing modes.
  • Immediate addressing where the operand is treated literally;
  • A type of Direct addressing where the value is stored in a register.

Task 3.1
 Really hard questions


Download the help sheet Exam Board Assembly Language Instruction Set from the lesson resources, print this out and use it to help you to complete the challenges on Assembly Code Practise. When you have finished, print out the sheet and hand it to your teacher for assessment.

OUTCOME : Completed worksheet


Extension Activities 

How about these?
  • More information about the LMC from Stephen Y. Chen from York University.

  • Read more about assembly language instructions in Assembly code operations in more detail which can be downloaded from the lesson resources.

  • There is a chapter from a book called “The Architecture of Computer Hardware, Systems Software & Networking” available for you to read in the lesson resources called ‘The little Man Computer’.

  • 8086 Emulator

    It's worth spending the £4.95 to buy this if you really like Assembly Code. This emulator package will let you play around programming a virtual 8086 (Intel) processor. Even if you don't buy it, the language reference and tutorials are fantastic! Visit the 8086 Emulator site for more details. 

  • The CARDIAC

    If you really want to push the boat out, you can investigate the CARDIAC or (Cardboard Illustrative Aid to Computation) produced in 1968 by Bell Laboratories to teach students about the operation of computers.

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.

END OF TOPIC ASSESSMENT