Login

Please fill in your details to login.





s5cs03 the appearance and forms of data

This page is mainly about s5cs03 the appearance and forms of data
image

Data floods into computer systems. It takes many forms; well, not that many actually. In this section, we take a look at the 'appearance' of data, how is is structured and how it is stored. A brief introduction to databases and file handling.

We are learning ...

What data looks like
About the fundamentals of databases
About the need for data security

So that we can ...

Describe where data comes from and what it looks like ...
- Data types (integer, real, boolean, character, string, datetime, record / dictionary, array)
- Typecasting
- Introduction to variable scope
- Pointer data type
- Big Data
Demonstrate an understanding of variables and constants
- The concept of a variable
- The concept of a constant
- Naming conventions for variables and constants
Describe abstract / user defined data structures ...
- General concept of a data structure
- Introduction to queues, stacks, lists, tuples, graphs, trees, hash tables, dictionaries, vectors
Know about the fundamentals of databases;
- Types of databases
- Key features of databases
Have practical experiences handle files; > Saving it for later
- Describe different access methods for files
- Handle simple text files
- Define a file in terms of records and fields (delimited) (NEW)
- Distinguish between master and transaction files (NEW)
- Describe serial, sequential / indexed sequential and direct (random) file access (NEW)
- Experience handling binary files
Justify the need for data security.

Activity 1
The hierarchy of wisdom

With the help of computers, humans organise raw data to give it meaning. Consider the following diagram. You may wish to sketch this in your notes. Where do we operate and where do computers operate? Give examples to illustrate your answer based on what you have discussed in class.

image

Data is raw facts and figures without structure, context or meaning...
Information is data with structure, context and meaning...

image

Knowledge involves awareness of the implications of the information developed from experience...
Wisdom is the power to make decisions based on the knowledge.

image

time limit
Task 1.1 Where does data come from?

Computers handle data all the time, but ...

What are it's sources?
How is it selected?
What are it's destinations?
How is it exchanged?
What does it look like?


Spend 10 minutes discussing these questions with your partners. Record your ideas on a copy of "Summary Mind Map" which your teacher will give you. You may be asked to engage in a discussion afterwards.

OUTCOME : Completed mind map document

Checkpoint

Activity 2
Big Data

Big data is a broad term for data sets so large or complex and often unstructured that traditional data processing applications are inadequate. New methods of analysing large datasets have had to be developed in order to identify trends and patterns which help business and research to gain insight into the behaviour of populations who are generating data at an unprecedented rate.

image
The FOUR V's of Big Data

time limit
Task 2.1 TED!

I love TED - not the bear. Watch the following TED video and take some notes in the back of your exercise books to help you with the rest of the task.

Kenneth Cukier: Big data is better data (15:15)

Using the following web resources and the diagram of "The FOUR V's of Big Data" above (from the IBM website), produce an engaging presentation about "Big Data - what it is and why it is important". Make sure your presentation does not include lots of text - it should be predominately image based - no more than 10 words per slide.

What is Big Data? from SAS.com

OUTCOME : Print out your presentation 3 slides per page (as a handout) and annotate the handout with some notes to explain what each slide shows.

Checkpoint

Big data is created all the time in many different areas of life, for example ...

Scientific research
Retail
Banking
Government
Mobile phone networks
Security
Real-time data collection
The Internet

time limit
Task 2.2 Examples of big data sources

For each of the sources of big data stated in the list above, describe ...

a) where the data might come from or what type of system might generate it and
b) what it might look like.

In your notebooks : Present your ideas in a colourful way.

OUTCOME : Colourful presentation of examples of big data sources

Checkpoint

Activity 3
Data Types

Computers need to categorise data on a fundamental level. There are 8 (or so) different fundamental datatypes which computers generally need to be able to deal with ...

Integer / whole numbers
Real / decimal / fixed point / float / floating point
Boolean
Currency
Time / Date
Character
String
Pointer

image

time limit
Task 3.1 Eight Fold Learning

Take a piece of paper and fold it into 8. Write the name of each datatype at the top of each square. Use that section to collect a definition and examples of each type from another person in the room. Get them to sign the square to prove you have done it. This activity is intended to engage you in discussion with other students! Talk to them!

Now answer the following question...

"There are 8 main data types. State the name of each one, write a definition and give 1 example of real world data which could be represented in each one."

OUTCOME : A completed 8-section sheet and a table in your notes.

Checkpoint

Variable and constants (an introduction to terminology)

Data in these categories is always represented in the computer using a variable. A variable is a named area of memory which is used to store the data. The computer is either told or works out what type the data is and this tells it what sorts of operations can be performed with it. The value of a variable can be changed by the computer, hence the name.

A constant is also a named area of memory where data is stored but, unlike a variable, it's value cannot be changed. Hence the name.

Naming variables and constants

Variable and constants are named using identifiers. An identifier is a unique name which is used to, well, identify the area of memory where the data is stored. Identifiers must generally conform to specific naming conventions such as having no spaces or symbols, not starting with numbers etc. They are often written in camelCase and must represent the meaning of the data they identify.

Variable scope

In programming languages, variable scope refers to the parts of the program where their value can be seen. Private variables are only available to specific parts of the program called subroutines whereas global variables are available to every part of the program.

Typecasting

Generally, variables are considered to be of one specific type. Sometimes, depending on the value they contain, they can be typecast into other types. For instance, the string "12" can be typecast into the integer 12.

time limit
Task 3.2 Flashcards - an exercise in comprehension

Write definition flashcards in your own words for the following terms ...

Variable
Constant
Name / Identifier
Variable Scope
Typecasting

OUTCOME : List of definitions in your notes

Checkpoint

Activity 4
Data Structures

At this stage, you need to be aware of the existence of the following different data structures. You will have lots of practice creating and managing these during the rest of the course - don't worry!

Before you start this section, create a blank word processed document with a suitable header and footer. Use screenshots and written explanation to document what you have done and what you have found out.

image
 
image

image

1
Python!!

Open up the Python programming environment, IDLE.

2
At the prompt

Practice creating a simple list of data and accessing items in that list using their index. Type the following commands, pressing the Enter key after each one, twice at the end.

>>> myList = [12, 3.1415927, "Hello", True]
>>> print(myList[0])
>>> print(myList[3])
>>> for item in myList:
        print(item)


Record what you have learned in your word processed document.

3
At the prompt

A record is a set of related data, identified by a key value. Ideally, we should use a dictionary or even a database for this but in this case, we'll store a set of data using a simple list. Try creating the following record in Python by typing the following line of code and pressing the Enter key.

>>> customer = [154,"Mills","Mark","3","Acacia Avenue","Smallville","SM42BB","0121-2324222"


As you see, this is a normal list but all the data is related to one 'customer', me (although the address is made up). The first field is the Key Value and it's unique to this record. Using list indices, try retrieving ...

Customer forename
Postcode
House number

You would find this very hard if you hadn't first typed in the record or at least knew it's structure - this is why dictionaries or databases are much better to work with for this type of data storage.

Record what you have learned in your word processed document.

4
Search a CSV file

Finally, download the search-a-csv-file.zip archive, decompress it to your documents and Rubber DuckA plastic duck - programmers use the term when they are explaining code out loud - instead of speaking to human, the programmer will talk to a duck on his / her desk. The vocalisation of the problem often yields the solution. the script before running it. Explain how the script works in your word processed document and include a copy of the script in your notebooks.

Now print out your word processed document for your notes

time limit
Task 4.1 Lists

Arrays (lists) are very powerful data structures. Come up with 10 lists that you could find in everyday life and describe them.

image

Now create your lists in the Python programming language. Provide evidence that you have done this using screenshots and written explanation.

OUTCOME : A list of lists!

Checkpoint

Remember that lists can be one, two, three or more dimensional. If you can't remember much about lists / arrays, you might want to take a look back at the Stage 4 Lesson called ''.

Advanced data structures

image
 
image
 
image
 
image
 
image
 
image
 
image
 
image

As you will learn about these in more depth in later sections of the course, we will only introduce them here.

time limit
Task 4.2 Advanced data structures

Download and complete the worksheet, advanced-data-structures.docx. Done.

OUTCOME : Complete worksheet, Advanced data structures.docx

Checkpoint

Activity 5
Databases - an introduction

At some point, even more complex data structures like stacks and queues just aren't enough. When the quantity of data exceeds that which we can easily store in one data structure and relationships between data begin to become clear, the only way to go is "The Database". A database is a ...

Persistent, organised store of related data

time limit
Task 5.1 Database research

You will have to do some research into databases using the Quackit website and write some of your own notes. I will be checking! Make sure you get definitions for the following terms…

Flat file database,
Relational database,
Primary key,
Foreign key,
Entity relationship model

...and create a WordItOut of 30 words from the notes you have made including the words in the list above.

OUTCOME : Notes about databases and a Wordle / Word Cloud of 30 relevant words.

Checkpoint

The data stores in a database are called tables or relations. The concepts that they model are called entities. The relationship between these entities is represented by a data model or Entity Relationship Diagram.

time limit
Task 5.2 Barry Williams rules, OK!

A database is a persistent, organised store of related data.

1
Visit the Databases.biz website;
2
Make sure you are in the 'Data models' section;
3
Choose one interesting data model;
4
Find the 'Entity Relationship Diagram' for the data model;
5
Print it out;
6
Annotate the data model to explain what it is for and how it relates to the real world.

OUTCOME : Data-model, printed and annotated.

Checkpoint

Activity 6
Files

MOVE THIS SECTION TO THE DATABASES SECTION. ADD TASKS ON CSV FILES / MASTER TRANSACTION FILES / SERIAL, SEQUENTIAL, RANDOM ACCESS FILES

Data can be stored permanently in files. Files can be...

Sequentially accessed - such as those stored on tape
Directly accessed - such as those stored on hard drive
Used for storing current data or used for archive / summary
Based on text or binary data
Built from a series of single items of data
Built from a series of delimited items of data

As we are working with Python, there are two types of files we can use - text files and binary (pickle) files. You will be given examples both, including opening, reading, writing, appending, updating and closing the files in the form of scripts.

Before you start this section, create a blank word processed document with a suitable header and footer. Use screenshots and written explanation to document what you have done and what you have found out.

image

1
Python 🐍

Open up the Python programming environment, IDLE.

2
Text files

Download the text-files.zip archive and decompress it into your userspace. Rubber DuckA plastic duck - programmers use the term when they are explaining code out loud - instead of speaking to human, the programmer will talk to a duck on his / her desk. The vocalisation of the problem often yields the solution. the scripts before running them. Explain how each script works in your word processed document and include a copy of the script for your notes.

3
Binary (Pickle) files

Download the Binary files.zip archive and decompress it into your userspace. Rubber Duck the scripts before running them. Explain how each script works in your word processed document and include a copy of the script for your notes.

Now print out your word processed document for your notes

Estimating file sizes

File size is clearly related to the amount of data which the file contains, obvs. Often, the actual filesize is different from the value you calculate...

it could be larger due to metadata
it could be smaller due to compression

Task 6.1 Calculating uncompressed file sizes

Calculate the approximate file size in bytes of the following text files. Hint : One character takes up 2 bytes (in Unicode).

1
A plain text file containing a new script for a play. There are 600 lines, each averaging 80 characters per line.
2
A binary pickle file containing 20 records, each of which is made up of 5 fields, each of length 20 characters.
3
A binary executable containing 150,000 lines of code, each one of which is an average of 25 characters long. What is inherently wrong with your answer?

OUTCOME : Calculations of file sizes, including workings out.

Checkpoint

Activity 7
Data security

Data security can mean two things ...

Keeping data private from unauthorised access
Keeping data safe from accidental or deliberate loss

Keeping data private from unauthorised access

Keeping your data safe must start with physical methods. When talking of hacking and data theft, it's often easy to forget that it's easier to put a padlock on the server room door than it is to implement some high falutin proxy firewall system.

Task 7.1 Lock it up

Read the following articles about physical computer security methods...

10 physical security measures every organization should take from Techrepublic
Physical security from Techtarget
Security controls from Redhat

...and watch this YouTube video...

Security and Data Protection in a Google Data Center

Now create a single slide presentation to summarise what you have learnt containing no more than 50 words. Print this out full size and stick it in your notebooks.

OUTCOME : Single slide presentation summarising physical security methods.

Checkpoint

Keeping data safe from accidental or deliberate loss

Backup and archiving are similar, related but different. Look carefully at the following infographic about 'Backup'.

image
Back it up!

time limit
Task 7.2 Your own infographic

If you are not sure what an infographic is, spend a little time on Pinterest. You will probably need an account to view the content and don't drift. If you can't use Pinterest, use Google Images instead.

Next, draft out your own infographic about Archiving on a piece of plain A4 paper. Don't use the computer to do this - paper helps you to gather your ideas.

image

OUTCOME : Beautiful infographic about Archiving.

Checkpoint

Extension Activities

How about these?

Choose one real world situation and datafy it! Write about what you have done.

Look back at Activity 8 (estimating file sizes) on the classwork activities sheet. Try actually making the first two files (the first one is a text file and the second one is a pickle (binary) file). Compare the actual file size to the one you calculated in class – can you explain any differences? Write about what you have done.

What's next?
Before you hand your book in for checking, make sure you have completed all the work required and that your book is tidy and organised. Your book will be checked to make sure it is complete and you will be given a spicy grade for effort.

END OF TOPIC ASSESSMENT

Last modified: February 14th, 2024
The Computing Café works best in landscape mode.
Rotate your device.
Dismiss Warning