Normalization

Normalization is a core concept in Database design that helps you organize data efficiently in a relational database. The goal is to reduce redundancy (duplicate data) and improve data integrity (accuracy and consistency).

1. What is Normalization?

Normalization is the process of structuring a database so that:

Each piece of data is stored only once
Relationships between data are clearly defined
Updates, inserts, and deletes don't create inconsistencies

2. Why is Normalization Important?

Normalization is important because it helps maintain a database that is efficient, consistent, and easy to manage.

Reduces Data Redundancy: Imagine storing a customer's address every time they place an order. If the address changes, you'd have to update it in multiple places! Normalization helps you store information in one place and refer to it from other places, minimizing repetition.
Improves Data Integrity: Less redundancy means less chance of inconsistencies. If you update an address in one place, it's updated everywhere it's needed.
Prevents Anomalies: Normalization helps prevent issues like:
- Insertion Anomalies: Difficulty adding new data because you're missing related information.
- Update Anomalies: Having to update the same information in multiple rows.
- Deletion Anomalies: Accidentally losing valuable information when you delete something seemingly unrelated.
Easier to Understand and Maintain: A normalized database is generally more logically structured and easier to understand, query, and modify.

3. First Normal Form (1NF): Atomic Values

All columns contain atomic values (i.e., indivisible values).
Each row is unique (i.e., no duplicate rows).
Each column has a unique name.

Example problem:

StudentID	StudentName	Courses
1001	John	DBMS, OS, CN
1002	Alice	DBMS, OS

Convert to 1NF:

StudentID	StudentName	Courses
1001	John	DBMS
1001	John	OS
1001	John	CN
1002	Alice	DBMS
1002	Alice	OS

4. Second Normal Form (2NF): Remove Partial Dependencies

Must already be in 1NF
No partial dependency (non-key attributes depend on only part of a composite key)

Every non-key column must depend on the whole primary key.

Assume:

Primary Key = (StudentID, Course)

StudentID	StudentName	Courses
1001	John	DBMS
1001	John	OS
1001	John	CN
1002	Alice	DBMS
1002	Alice	OS

StudentName depends only on StudentID not on (StudentID, Course).

This is a partial dependency.

Convert to 2NF:

Students:

StudentID	StudentName
1001	John
1002	Alice

Enrollments:

StudentID	Courses
1001	DBMS
1001	OS
1001	CN
1002	DBMS
1002	OS

5. Third Normal Form (3NF): Remove Transitive Dependencies

Must already be in 2NF
No transitive dependency (non-key attributes depend on other non-key attributes)

Suppose we have:

StudentID	DepartmentID	DepartmentName
1001	D1	Computer Science
1002	D2	Electronics

Here:

StudentID → DepartmentID DepartmentID → DepartmentName

Therefore:

StudentID → DepartmentName

This is a transitive dependency.

Convert to 3NF:

Students:

StudentID	DepartmentID
1001	D1
1002	D2

Departments:

DepartmentID	DepartmentName
D1	Computer Science
D2	Electronics

6. Boyce-Codd Normal Form (BCNF): Stronger Version of 3NF

A stricter version of 3NF
Every determinant must be a candidate key

Example

A university has:

Teacher	Subject	Room
Smith	DBMS	R101
Smith	OS	R101
Jones	CN	R102

Rules:

Teacher → Room Subject → Teacher

Candidate Keys: Subject because a subject has one teacher.

Problem

Teacher determines Room, but Teacher is not a candidate key.

This violates BCNF.

Convert to BCNF:

Teachers:

Teacher	Room
Smith	R101
Jones	R102

Courses:

Subject	Teacher
DBMS	Smith
OS	Smith
CN	Jones

7. Normalization Summary

Normal Form	Goal	Problem Solved
1NF	Ensure all attributes contain atomic (single) values	Removes repeating groups and multi-valued attributes
2NF	Ensure non-key attributes depend on the entire primary key	Removes partial dependencies
3NF	Ensure non-key attributes depend only on the key	Removes transitive dependencies
BCNF	Ensure every determinant is a candidate key	Removes remaining anomalies not handled by 3NF

Normalization is the process of organizing data in a database to reduce redundancy, eliminate data anomalies, and improve data integrity by dividing data into related tables and defining proper relationships between them.

StudentID	StudentName	Courses
1001	John	DBMS, OS, CN
1002	Alice	DBMS, OS

StudentID

StudentName

Courses

1001

John

DBMS, OS, CN

1002

Alice

DBMS, OS

StudentID	StudentName	Courses
1001	John	DBMS
1001	John	OS
1001	John	CN
1002	Alice	DBMS
1002	Alice	OS

StudentID

StudentName

Courses

1001

John

DBMS

1001

John

1001

John

1002

Alice

DBMS

1002

Alice

StudentID	StudentName	Courses
1001	John	DBMS
1001	John	OS
1001	John	CN
1002	Alice	DBMS
1002	Alice	OS

StudentID

StudentName

Courses

1001

John

DBMS

1001

John

1001

John

1002

Alice

DBMS

1002

Alice

StudentID	StudentName
1001	John
1002	Alice

StudentID

StudentName

1001

John

1002

Alice

StudentID	Courses
1001	DBMS
1001	OS
1001	CN
1002	DBMS
1002	OS

StudentID

Courses

1001

DBMS

1001

1002

DBMS

1002

StudentID	DepartmentID	DepartmentName
1001	D1	Computer Science
1002	D2	Electronics

StudentID

DepartmentID

DepartmentName

1001

Computer Science

1002

Electronics

StudentID	DepartmentID
1001	D1
1002	D2

StudentID

DepartmentID

1001

1002

DepartmentID	DepartmentName
D1	Computer Science
D2	Electronics

DepartmentID

DepartmentName

Computer Science

Electronics

Teacher	Subject	Room
Smith	DBMS	R101
Smith	OS	R101
Jones	CN	R102

Teacher

Subject

Room

Smith

DBMS

R101

Smith

R101

Jones

R102

Teacher	Room
Smith	R101
Jones	R102

Teacher

Room

Smith

R101

Jones

R102

Subject	Teacher
DBMS	Smith
OS	Smith
CN	Jones

Subject

Teacher

DBMS

Smith

Jones

Normal Form	Goal	Problem Solved
1NF	Ensure all attributes contain atomic (single) values	Removes repeating groups and multi-valued attributes
2NF	Ensure non-key attributes depend on the entire primary key	Removes partial dependencies
3NF	Ensure non-key attributes depend only on the key	Removes transitive dependencies
BCNF	Ensure every determinant is a candidate key	Removes remaining anomalies not handled by 3NF

Normal Form

Goal

Problem Solved

1NF

Ensure all attributes contain atomic (single) values

Removes repeating groups and multi-valued attributes

2NF

Ensure non-key attributes depend on the entire primary key

Removes partial dependencies

3NF

Ensure non-key attributes depend only on the key

Removes transitive dependencies

BCNF

Ensure every determinant is a candidate key

Removes remaining anomalies not handled by 3NF