Normalization (1NF, 2NF, 3NF)
Normalization is the process of organizing a database to reduce redundancy (repeated data) and dependency problems. It involves splitting large tables into smaller, well-structured ones linked by keys.
Learning Objectives
- 11.4.1.5 Define connections (1NF, 2NF, 3NF)
Conceptual Anchor
The Filing Cabinet Analogy
Imagine a messy filing cabinet where each folder duplicates information (student name in every grade folder). Normalization is like reorganizing the cabinet: put student info in ONE folder, grade info in ANOTHER, and link them with a student ID number. Less duplication, fewer errors.
Rules & Theory
Why Normalize?
| Problem | Without Normalization | With Normalization |
|---|---|---|
| Data Redundancy | Same data stored multiple times | Data stored once |
| Update Anomaly | Change in one place but not another | Change once, reflected everywhere |
| Insert Anomaly | Can't add data without unrelated data | Can add independently |
| Delete Anomaly | Deleting one thing removes other data | Deletion is isolated |
The Three Normal Forms
| Form | Rule | In Simple Words |
|---|---|---|
| 1NF | No repeating groups; each cell has a single atomic value | "No lists in cells" |
| 2NF | 1NF + no partial dependencies (all non-key fields depend on the WHOLE PK) | "Depends on the whole key" |
| 3NF | 2NF + no transitive dependencies (non-key fields don't depend on other non-key fields) | "Depends on nothing but the key" |
Memory Trick
"Every non-key attribute must depend on the key (1NF), the whole key (2NF), and nothing but the key (3NF) — so help me Codd!"
Step-by-Step Example
UNNORMALIZED (UNF):
┌──────────┬────────┬──────────────────────────────────┐
│ StudentID│ Name │ Courses │
├──────────┼────────┼──────────────────────────────────┤
│ S001 │ Ali │ Maths, Physics, CS │ ← list in cell!
│ S002 │ Dana │ CS, English │ ← list in cell!
└──────────┴────────┴──────────────────────────────────┘1NF — Remove repeating groups (one value per cell):
┌──────────┬────────┬──────────┐
│ StudentID│ Name │ Course │
├──────────┼────────┼──────────┤
│ S001 │ Ali │ Maths │
│ S001 │ Ali │ Physics │ ← "Ali" repeated!
│ S001 │ Ali │ CS │ ← redundancy
│ S002 │ Dana │ CS │
│ S002 │ Dana │ English │ ← "Dana" repeated!
└──────────┴────────┴──────────┘
PK = (StudentID + Course) ← composite key2NF — Remove partial dependencies:
"Name" depends only on StudentID, not on the full PK
Students Table: Enrolment Table:
┌──────────┬────────┐ ┌──────────┬──────────┐
│ StudentID│ Name │ │ StudentID│ Course │
├──────────┼────────┤ ├──────────┼──────────┤
│ S001 │ Ali │ │ S001 │ Maths │
│ S002 │ Dana │ │ S001 │ Physics │
└──────────┴────────┘ │ S001 │ CS │
│ S002 │ CS │
│ S002 │ English │
└──────────┴──────────┘3NF — Remove transitive dependencies:
Suppose Enrolment also had TeacherName (which depends on Course, not the PK):
Before 3NF: After 3NF:
┌─────────┬────────┬────────────┐ Enrolment: Courses:
│ StudentID│ Course │ TeacherName│ ┌────────┬──────┐ ┌──────┬───────────┐
│ S001 │ Maths │ Mr Smith │ │StudentID│Course│ │Course│TeacherName│
│ S001 │ CS │ Ms Lee │ └────────┴──────┘ └──────┴───────────┘
│ S002 │ CS │ Ms Lee │ (FK to both)
└─────────┴────────┴────────────┘
TeacherName depends on Course (non-key → non-key) = transitive!Common Pitfalls
Skipping Normal Forms
You cannot jump to 3NF without achieving 1NF and 2NF first. Normalization is sequential — each form builds on the previous one.
Over-Normalizing
In practice, going beyond 3NF can make queries very complex with many JOINs. 3NF is usually sufficient for most databases.
Tasks
State the rules for 1NF, 2NF, and 3NF.
Normalize the following table to 3NF: OrderID, CustomerName, CustomerAddress, ProductName, ProductPrice, Quantity.
Explain what an "update anomaly" is and how normalization prevents it.
Self-Check Quiz
Q1: What does 1NF require?
Q2: What is a partial dependency?
Q3: What is the memory trick for the 3 normal forms?