CS03 : What does data look like?


Data floods into computer systems. It takes many forms; well, not that many actually. In this section, we take a look at the 'appearance' of data, how is is structured and how it is stored. A brief introduction to databases and file handling.

We are learning ...
  • What data looks like
  • About the fundamentals of databases
  • About the need for data security
So that we can ...
  • Describe where data comes from and what it looks like ...
    - Data types (integer, real, boolean, character, string, datetime, record / dictionary, array)
    - Typecasting
    - Introduction to variable scope
    Pointer data type
    - Big Data
  • Demonstrate an understanding of variables and constants
    - The concept of a variable
    - The concept of a constant
    - Naming conventions for variables and constants
  • Describe abstract / user defined data structures ...
    - General concept of a data structure
    Introduction to queues, stacks, lists, tuples, graphs, trees, hash tables, dictionaries, vectors
  • Know about the fundamentals of databases;
    - Types of databases
    - Key features of databases
  • Have practical experiences handle files; > Saving it for later
    - Describe different access methods for files
    - Handle simple text files
    - Define a file in terms of records and fields (delimited) (NEW)
    - Distinguish between master and transaction files (NEW)
    - Describe serial, sequential / indexed sequential and direct (random) file access (NEW)
    - Experience handling binary files
  • Justify the need for data security.

Activity 1 The hierarchy of wisdom (20)

With the help of computers, humans organise raw data to give it meaning. Consider the following diagram. You may wish to sketch this in your notes. Where do we operate and where do computers operate? Give examples to illustrate your answer based on what you have discussed in class.


Data is raw facts and figures without structure, context or meaning ...
Information is data with structure, context and meaning ...



Knowledge involves awareness of the implications of the information developed from experience ...
Wisdom is the power to make decisions based on the knowledge.


Task 1.1 Where does data come from?

Computers handle data all the time, but ...
  • What are it's sources?
  • How is it selected?
  • What are it's destinations?
  • How is it exchanged?
  • What does it look like?
Spend 10 minutes discussing these questions with your partners. Record your ideas on a copy of Summary Mind Map.docx which you can download and print from the lesson resources. You may be asked to engage in a discussion afterwards.

OUTCOME : Completed mind map document


Activity 2 Big Data (50)  A Level Only

Big data is a broad term for data sets so large or complex and often unstructured that traditional data processing applications are inadequate. New methods of analysing large datasets have had to be developed in order to identify trends and patterns which help business and research to gain insight into the behaviour of populations who are generating data at an unprecedented rate.

https://drive.google.com/file/d/0B83yXMOilskaUFJQbnFYQ0JsREk/view?usp=drive_web
Click to enlarge

Task 2.1 TED!

I love TED - not the bear. Watch the following TED video and take some notes in the back of your exercise books to help you with the rest of the task.


Using the following web resources and the diagram of the 4V's of Big Data above (from the IBM website), produce a presentation about "Big Data - what it is and why it is important". Make sure your presentation does not include lots of text - it should be predominately image based.
OUTCOME : Print out your presentation 3 slides per page (as a handout) and annotate the handout with some notes to explain what each slide shows.


Big data is created all the time in many different areas of life, for example ...
  • Scientific research
  • Retail
  • Banking
  • Government
  • Mobile phone networks
  • Security
  • Real-time data collection
  • The Internet

Task 2.2 Examples of big data sources

For each of the sources of big data stated in the list above, describe ...

a) where the data might come from or what type of system might generate it and
b) what it might look like.

In your notebooks : Present your ideas in a colourful way.

OUTCOME : Colourful presentation of examples of big data sources


Activity 3 Data Types (40)

Computers need to categorise data on a fundamental level. There are 8 (or so) different fundamental datatypes which computers generally need to be able to deal with ...
  • Integer / whole numbers
  • Real / decimal / fixed point / float / floating point
  • Boolean
  • Currency
  • Time / Date
  • Character
  • String
  • Pointer [A Level Only]

Task 3.1 Eight Fold Learning

Take a piece of paper and fold it into 8. Write the name of each datatype at the top of each square. Use that section to collect a definition and examples of each type from another person in the room. Get them to sign the square to prove you have done it. This activity is intended to engage you in discussion with other students! Talk to them!

Now answer the following question ...

"There are 8 main data types. State the name of each one, write a definition and give 1 example of real world data which could be represented in each one."

OUTCOME : A completed 8-section sheet and a table in your notes.


Variable and constants (an introduction to terminology)

Data in these categories is always represented in the computer using a variable. A variable is a named area of memory which is used to store the data. The computer is either told or works out what type the data is and this tells it what sorts of operations can be performed with it. The value of a variable can be changed by the computer, hence the name.

A constant is also a named area of memory where data is stored but, unlike a variable, it's value cannot be changed. Hence the name.

Naming variables and constants

Variable and constants are named using identifiers. An identifier is a unique name which is used to, well, identify the area of memory where the data is stored. Identifiers must generally conform to specific naming conventions such as having no spaces or symbols, not starting with numbers etc. They are often written in camelCase and must represent the meaning of the data they identify.

Variable scope

In programming languages, variable scope refers to the parts of the program where their value can be seen. Private variables are only available to specific parts of the program called subroutines whereas global variables are available to every part of the program.

Typecasting

Generally, variables are considered to be of one specific type. Sometimes, depending on the value they contain, they can be typecast into other types. For instance, the string "12" can be typecast into the integer 12.


Task 3.2 Definitions - an exercise in comprehension

Write definitions in your own words for the following terms ...
  • Variable
  • Constant
  • Name / Identifier
  • Variable Scope
  • Typecasting
OUTCOME : List of definitions in your notes


Activity 4 Data Structures (80)

At this stage, you need to be aware of the existence of the following different data structures. You will have lots of practice creating and managing these during the rest of the course - don't worry!

Before you start this section, create a blank word processed document with a suitable header and footer. Use screenshots and written explanation to document what you have done and what you have found out.

 

  • Open up the Python programming environment, IDLE.

  • At the prompt : Practice creating a simple list of data and accessing items in that list using their index. Type the following commands, pressing the  Enter  key after each one, twice at the end.

    >>> myList = [12, 3.1415927, "Hello", True]
    >>> print(myList[0])
    >>> print(myList[3])
    >>> for item in myList:
            print(item)

    Record what you have learned in your word processed document.

  • At the prompt : A record is a set of related data, identified by a key value. Ideally, we should use a dictionary or even a database for this but in this case, we'll store a set of data using a simple list. Try creating the following record in Python by typing the following line of code and pressing the  Enter  key.

    >>> customer = [154,"Mills","Mark","3","Acacia Avenue","Smallville","SM42BB","0121-2324222"]

    As you see, this is a normal list but all the data is related to one 'customer', me (although the address is made up). The first field is the Key Value and it unique to this record. Using list indices, try retrieving ...

    - Customer forename
    - Postcode
    - House number

    You would find this very hard if you hadn't first typed in the record or at least knew it's structure - this is why dictionaries or databases are much better to work with for this type of data storage.

    Record what you have learned in your word processed document.

  • Finally, download the Search a CSV file.zip archive, decompress it to your documents and Rubber Duck the script before running it. Explain how the script works in your word processed document and include a copy of the script in your notebooks.

Now print out your word processed document for your notes


Task 4.1 Lists

Arrays (lists) are very powerful data structures. Come up with 10 lists that you could find in everyday life and describe them.


Now create your lists in the Python programming language. Provide evidence that you have done this using screenshots and written explanation.

OUTCOME : A list of lists!

Remember that lists can be one, two, three or more dimensional. If you can't remember much about lists / arrays, you might want to take a look back at the Stage 4 Lesson called 'Building data structures'.


Advanced data structures [A Level Only]

       

As you will learn about these in more depth in later sections of the course, we will only introduce them here.

Task 4.2 Advanced data structures

Download and complete the worksheet, Advanced data structures.docx. Done.

OUTCOME : Complete worksheet, Advanced data structures.docx


Activity 5 Databases - an introduction (50)

At some point, even more complex data structures like stacks and queues just aren't enough. When the quantity of data exceeds that which we can easily store in one data structure and relationships between data begin to become clear, the only way to go is "The Database". A database is a ...

Persistent, organised store of related data

Task 5.1 Database research

You will have to do some research into databases using the Quackit website and write some of your own notes. I will be checking! Make sure you get definitions for …
  • Flat file database, 
  • Relational database, 
  • Primary key, 
  • Foreign key, 
  • Entity relationship model 
… and create a WordItOut of 30 words from the notes you have made including the words in the list above.

OUTCOME : Notes about databases and a Wordle / Word Cloud of 30 relevant words.


The data stores in a database are called tables or relations. The concepts that they model are called entities. The relationship between these entities is represented by a data model or Entity Relationship Diagram.

Task 5.2 Barry Williams rules, OK!

A database is a persistent, organised store of related data.
  1. Visit the Database Answers website;
  2. Go to the 'Data models' section;
  3. Choose one interesting data model;
  4. Find the 'Entity Relationship Diagram' for the data model;
  5. Print it out;
  6. Annotate the data model to explain what it is for and how it relates to the real world.
OUTCOME : Data-model, printed and annotated.


Activity 6 Files (50)

MOVE THIS SECTION TO THE DATABASES SECTION. ADD TASKS ON CSV FILES / MASTER TRANSACTION FILES / SERIAL, SEQUENTIAL, RANDOM ACCESS FILES

Data can be stored permanently in files. Files can be ...
  • Sequentially accessed - such as those stored on tape
  • Directly accessed - such as those stored on hard drive
  • Used for storing current data or used for archive / summary
  • Based on text or binary data
  • Built from a series of single items of data
  • Built from a series of delimited items of data
As we are working with Python, there are two types of files we can use - text files and binary (pickle) files. You will be given examples both, including opening, reading, writing, appending, updating and closing the files in the form of scripts.

Before you start this section, create a blank word processed document with a suitable header and footer. Use screenshots and written explanation to document what you have done and what you have found out. 

  • Open up the Python programming environment, IDLE.

  • Download the Text files.zip archive and decompress it into your userspace. Rubber Duck the scripts before running them. Explain how each script works in your word processed document and include a copy of the script for your notes.

  • Download the Binary files.zip archive and decompress it into your userspace. Rubber Duck the scripts before running them. Explain how each script works in your word processed document and include a copy of the script for your notes.

Now print out your word processed document for your notes


Estimating file sizes

File size is clearly related to the amount of data which the file contains, obvs. Often, the actual filesize is different from the value you calculate ...
  • it could be larger due to metadata
  • it could be smaller due to compression

Task 6.1 Calculating uncompressed file sizes

Calculate the approximate file size in bytes of the following text files. Hint : One character takes up 2 bytes (in Unicode).
  1. A plain text file containing a new script for a play. There are 600 lines, each averaging 80 characters per line.
  2. A binary pickle file containing 20 records, each of which is made up of 5 fields, each of length 20 characters.
  3. A binary executable containing 150,000 lines of code, each one of which is an average of 25 characters long. What is inherently wrong with your answer?
OUTCOME : Calculations of file sizes, including workings out.


Activity 7 Data security (60)

Data security can mean two things ...
  • Keeping data private from unauthorised access
  • Keeping data safe from accidental or deliberate loss
Keeping data private from unauthorised access

Keeping your data safe must start with physical methods. When talking of hacking and data theft, it's often easy to forget that it's easier to put a padlock on the server room door than it is to implement some high falutin proxy firewall system.

Task 7.1 Lock it up

Read the following articles about physical computer security methods ...
... and watch this YouTube video ...


Now create a single slide presentation to summarise what you have learnt containing no more than 50 words. Print this out full size and stick it in your notebooks.

OUTCOME : Single slide presentation summarising physical security methods.


Keeping data safe from accidental or deliberate loss

Backup and archiving are similar, related but different. Look carefully at the following infographic about 'Backup'.

https://drive.google.com/file/d/0B83yXMOilskaUW01bWU1RzJPMm8/view?usp=drive_web
Click to enlarge

Task 7.2 Your own infographic

If you are not sure what an infographic is, spend a little time on Pinterest. You will probably need an account to view the content and don't drift. If you can't use Pinterest, use Google Images instead.

Next, draft out your own infographic about Archiving on a piece of plain A4 paper. Don't use the computer to do this - paper helps you to gather your ideas.


OUTCOME : Beautiful infographic about Archiving.


Extension Activities 

How about these?
  • Choose one real world situation and datafy it! Write about what you have done.

  • Look back at Activity 8 (estimating file sizes) on the classwork activities sheet. Try actually making the first two files (the first one is a text file and the second one is a pickle (binary) file). Compare the actual file size to the one you calculated in class – can you explain any differences? Write about what you have done.

  • Tweet about data security to @EBAcomputing

What's next?

Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.

END OF TOPIC ASSESSMENT