lesson 3-13-3 cleaning the data
Handling Missing Values and Inconsistent Formatting
Have you ever tried to solve a mystery, but the clues were smudged, ripped, or just didn't make sense? That's exactly what it's like for a Data Scientist! They get huge digital 'clue files' (datasets) to solve big problems, but the clues are often messy. In this lesson, you'll become a Data Detective. You'll learn how to spot the messy clues – the missing information, the typos, and the odd-one-out data points. More importantly, you'll learn the spreadsheet tools to clean it all up, making sure your clues are perfect before you start solving the mystery. This is a vital first step on the Data Science & Analytics pathway!
Learning Outcomes
The Building Blocks (Factual Knowledge)
The Connections and Theories (Conceptual Knowledge)
The Skills and Methods (Procedural Outcomes)
Recall that real-world data is often "messy".
Describe what is meant by missing values, inconsistent formatting, and outliers in a dataset.
The Connections and Theories (Conceptual Knowledge)
Explain why data cleaning is a crucial step for ensuring accurate analysis.
Analyse how incomplete or biased training data can lead to unfair outcomes in an AI system.
Digital Skill Focus: Sorting, filtering, find and replace using spreadsheets.
The Skills and Methods (Procedural Outcomes)
Apply spreadsheet tools like sorting and filtering to identify inconsistencies and outliers in a dataset.
Use spreadsheet tools like 'Find and Replace' to correct inconsistent data.
CONTENT
Last modified: October 2nd, 2025