s5cs44 databases
This page is mainly about s5cs44 databases
Saving data in a file is OK but you still have to store, retrieve, amend and search for the data yourself. There is a better way luckily which we've touched on before - that database. Hang on though, what about unstructured data? Oh, no, enter BIG DATA!
We are learning ...
About practical file handling techniques
About data modelling
About the concept and features of relational databases
How to use SQL to define, retrieve, update, insert and delete data
About the functions of the database management system (DBMS)
So that we can ...
Have practical experiences handle files; (NEW SECTION < What does data look like)
- Describe different access methods for files
- Handle simple text files
- Define a file in terms of records and fields (delimited) (NEW SECTION)
- Distinguish between master and transaction files (NEW SECTION)
- Describe serial, sequential / indexed sequential and direct (random) file access (NEW SECTION)
- Experience handling binary files
Produce a data model from given data requirements for a simple scenario involving multiple entities
Produce entity relationship diagrams for a data model in standard format
- entity(PK,FK,NKA,...)
Explain the concept of a relational database
- Data consistency
- Data redundancy
- Data independence
Define terms associated with databases
- Entity
- Attribute
- Primary key
- Composite key
- Foreign key
- Multi-level indices
Understand why databases are normalised
- Normalise relations to third normal form
Use SQL to :
- Define relations
- Retrieve, update, insert and delete data from multiple tables in a relational database
- Manage issues of referential integrity
Describe the function of a database management system
Understand the benefits and drawbacks of client server databases
- Simultaneous access for multiple clients
- Update anomalies due to concurrent access and methods of resolution (record locks, serialisation, timestamp ordering, commitment ordering)
- ACID (Atomicity, Consistency, Isolation, Durability)
Show familiarity with
- Fact-based models for representing big data
- Graph schema for capturing the structure of a dataset (nodes, edges and properties)
- Use of distributed systems for managing big data.


Activity 1
What is a database?
A database ...
Is an organised, persistent collection of data;
Is a dynamic data structure, not a flat one like a 'flat file' such as a CSV or TSV file.
Databases can be ...
Organised / sorted in different ways;
Displayed / printed in a variety of styles and layouts;
Searched for specific information to produce complex results;
Updated by adding / changing / deleting items in often quite complex ways;
Backed up easily and automatically;
Local or distributed;
Stored / moved easily.


Task 1.1 What use are databases?
The database is quite possibly the most important large scale computational 'device' that exists in the world today. Everything that you do in your life depends, on some level, on a database. Whether it's shopping, learning, relaxing or socialising, database are key (pun intended).
In your notebook : Create a mindmap of as many real world activities that a database could be involved in. You might want to read the information on this webpage to help you.
OUTCOME : List of database applications.

A database is actually a collection of objects. Some of the objects hold data, some define its structure, some aid sorting and searching the data and some determine how it is displayed.


Activity 2
Data models
A data model is a method of describing the data, its structure, the way it is interrelated and the constraints that apply to its organisation. A conceptual data model is a representation of the data requirements of an organisation constructed in a way that is independent of any software used to construct the database.

Data models are generally constructed based on rules first proposed by Peter Chen in 1976, where the data model views the world as distinct entities, described by attributes and relationships between them ...
Entity : An object, person, event or thing of interest to an organisation about which data has to be stored.
Attribute : A relevant property or characteristic of an entity by which it can be identified and described.
Relationship : A link or association between entities based on their attributes.

Data models are represented using Entity Relationship Diagrams, or ERDs.

Task 2.1 Generating a data model
Think about a school. Maybe it's the school you are in now or one that you went to in the past. What 'makes' up the school? Think about all the physical things and all the conceptual things as well. These are the entities of the school data model. Each entity must have a series of attributes which describe it and a series of ways in which they are related to each other.

In your notebooks : Write a list of all the Entities you could find in a school and describe their attributes and the relationships that exist between them.
OUTCOME : Written description of a school based on entities, attributes and relationships.

Entity occurrences
Any entities you identify in a complex system like this are always singular, i.e. 'Pupil', not 'Pupils'. You will see this written incorrectly in many places online and in textbooks (shame). The entity describes the characteristics of one instance of the entity - a so called entity occurrence.

Importantly, there has to be a way of uniquely identifying each entity occurrence, in this case by the pupils name. In practice, something more complicated is often used, like an ID or Code (which would allow the pupil to change his / her name without compromising the 'uniqueness' of the entity occurrence).
More on relationships
There are three main types of relationship that can exist between entities ...
One to One (1:1)
One to Many (1:M)
Many to Many (M:M)
... plus all three relationships can be optional (zero or zero or many) or mandatory (one or one or many). Since the relationships are binary (i.e. have two sides), that gives us 36 different combinations. The relationships are shown using special arrows, one side of which are shown in the diagram.

The entities themselves are shown using boxes with at least the entity name and often with the attributes as well, though we'll look at that in a second. Each binary relationship can be 'read' from Left To Right (L2R) or Right To Left (R2L). Consider the following examples before attempting the task which follows.


Task 2.2 What relationships, brother?
In your notebooks : Sketch entity relationships for the following binary pairs.
Customer and order
Husband and wife (in western society)
Product and barcode
Network user and network permission
Film and film star
Household and main residence
GP and registered patient
Customer and product
Artist and album
Author and book
Student and course
OUTCOME : Simple, binary entity relationship diagrams for some common situations.

More about attributes
Data about each entity occurrence is stored as values in attributes within the entity itself. There are three 'types' of attribute which we use to describe both the entity occurrence and define the relationship between each one.
Primary Key Attributes (PK) uniquely identify each entity occurrence.
Foreign Key Attributes (FK) are found on the 'many' (or 'one' in a one-to-one) side of a relationship and always take the same values as the primary key attribute in a linked entity. They define the relationship.
Non-Key Attributes describe characteristics of the entity.

I will!
You choose a primary key based on any unique characteristic of the entity or, if one doesn't exist, you make one up. Primary keys can be single attributes or combinations of more than one attribute - whatever is simpler to work with. No matter what, however ...
The Primary Key must uniquely identify each entity occurrence and cannot be repeated!

Fingerprints are unique and can be used as a primary key for humans
Primary key attributes are often underlined whereas foreign key attributes are italicised however, I'm likely to simple add the suffix [PK] for the primary key and [FK] for the foreign key, if that's OK?

What practical significance does this have? Read through the following examples, one at a time ...




Task 2.3 Notes?
In your notebooks : Make some notes on the significance of Primary, Foreign and Non-Key attributes in your notebooks. Use the examples you have just seen to help you.
OUTCOME : Notes on the importance of Primary, Foreign and Non-Key attributes.

Resolving the many to many relationship problem
As we have just seen, many to many relationships are bad because they are difficult to implement without breaking the rule of the Primary Key - that it should be unique.
I suppose that, in the example we looked at, we could have a combined primary key of
TeacherID
and PupilID
in both entities which would allow one entity occurrence for every teacher - pupil partnership but that would also mean that each entity occurrence would have a copy of the teacher forename and surname and the pupil forename and surname which introduces a whole raft of problems around data consistency and unproductive maintenance (see later).
To fix this, we introduce an associative entity which sits between the two other entities and holds details of a single occurrence of the relationship between the two entities. For instance ...

Here, the associative entity,
Taught
, holds each instance of a Teacher teaching a pupil and, through the inclusion of a separated TaughtID
, allows a teacher to teach a pupil multiple times (maybe in different classes). This model also allows an extra non-key attribute, Notes
, to be used to keep notes on each Taught entity occurrence, such as which class they were in or how they are progressing, for instance.
Task 2.4 Resolving the M:M Relationship
In your notebooks : You should have identified a number of many to many relationships from Task 2.2. Your task is to resolve them by introducing a suitable associative entity.
OUTCOME : Resolved many to many relationships.

Investigating complex relationships
Since life is complex, real life data models tend to be complex as well, often involving many entities and complex relationships between them. Before we move on, it's time to have a look at some complex relationships courtesy of Barry Williams!

Task 2.5 Database Answers
In your notebooks : Visit the Database Answers website data models section and marvel at how many there are (over 1500!). Choose one, print it out and write about it in your notebooks.
OUTCOME : Annotated complex data model


Activity 3
Practical database design

At about the same time as Peter Chen proposed the Entity Relationship Model, a computer scientist working for IBM called Edgar F. Codd invented a practical realisation of this theory when he designed the first the Relational Data Base Management System or RDBMS.
Both the Entity Relationship model and the RDBMS originally served to separate the data from the processes which managed it.
A simple, flat data structure
Let's consider a simple example of data storage - a customer list. A tabular representation of the customers is correctly called a relation (but often a table) and is a realisation of the customer entity which we have already met (so many italics).

A relation
In order for us to be able to refer to the relation, we must give it a unique name within the relation. Each record (more accurately called a tuple) must have a unique value which can be used to identify it also which we refer to as the primary key. We can not define the structure of our relation using standardised table notation as follows if we call the relation customer.
customer ( ID, name, address, dob, phoneNumber )
This relation name refers to the entity in the data model. Notice that the table name is emboldened and the primary key is underlined. This structure definition is the heading and the data in the table is the body. Remember that a tuple is a row of data in the table (similar to a record) and an attribute can be represented as a named column in the table.

Task 3.1 Worksheet time!
On the worksheet : Download and complete the worksheet 'Table Terminology' which requires you to match up, well, table terminology. You should find that this links up nicely with the conceptual data modelling work we have already done.
OUTCOME : Completed Table Terminology.docx worksheet

Note : if you do much work with Access databases, you will often use Leszynski / Reddick table notation to represent the names and structures of the database objects. Even though I used to do this when I worked with Access, I've chosen not to name the objects in this fashion to reduce (!) the complexity of the topic. However, feel free to prefix your relation names if you wish!
Last modified: February 14th, 2024