Lesson 29 of the SQL Optimization Course: Optimize SQL Table Design for Peak Performance | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


Understanding Database Normalization Fundamentals

Database normalization is a foundational concept in relational database design that ensures data integrity while minimizing redundancy. By adhering to a series of progressive rules called normal forms, developers can create efficient table structures optimized for scalability and performance. Below, we explore each normal form using practical examples to demonstrate their real-world applications.


1. First Normal Form (1NF): Ensuring Atomicity

Objective: Eliminate repeating groups and ensure each column contains atomic (indivisible) values.

Example:
A poorly designed employee table storing salaries as JSON:

1
2
3
4
5
6
7
CREATE TABLE employee (
    employee_number VARCHAR(64),
    employee_name VARCHAR(64),
    salary JSON,          -- Violates 1NF (non-atomic)
    dept VARCHAR(64),
    dept_desc TEXT
);

Fix: Split multi-value columns into separate atomic fields:

1
2
3
ALTER TABLE employee 
ADD COLUMN base_salary DECIMAL(10,2),
ADD COLUMN extra_salary DECIMAL(10,2);


2. Second Normal Form (2NF): Eliminating Partial Dependencies

Objective: All non-key attributes must depend on the entire primary key (for composite keys).

Example:
A denormalized student table with partial dependencies:

1
2
3
4
5
6
7
CREATE TABLE student (
    student_id INT,
    course_id INT,
    student_name VARCHAR(50),
    course_credits INT,
    PRIMARY KEY (student_id, course_id)
);

Fix: Split into normalized tables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
-- Students table (no partial dependencies)
CREATE TABLE students (
    student_id INT PRIMARY KEY,
    student_name VARCHAR(50)
);

-- Courses table (no partial dependencies)
CREATE TABLE courses (
    course_id INT PRIMARY KEY,
    course_credits INT
);

-- Enrollment bridge table
CREATE TABLE enrollments (
    student_id INT,
    course_id INT,
    grade VARCHAR(2),
    PRIMARY KEY (student_id, course_id),
    FOREIGN KEY (student_id) REFERENCES students(student_id),
    FOREIGN KEY (course_id) REFERENCES courses(course_id)
);


3. Third Normal Form (3NF): Removing Transitive Dependencies

Objective: Non-key attributes must depend only on the primary key, not other non-key attributes.

Example:
A students table with transitive dependency:

1
2
3
4
5
6
CREATE TABLE students (
    student_id INT PRIMARY KEY,
    department_name VARCHAR(50),
    department_address VARCHAR(100),  -- Depends on department_name, not student_id
    department_phone VARCHAR(15)
);

Fix: Extract transitive dependencies into separate tables:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
-- Normalized students table
CREATE TABLE students (
    student_id INT PRIMARY KEY,
    department_id INT,
    FOREIGN KEY (department_id) REFERENCES departments(department_id)
);

-- Separate departments table
CREATE TABLE departments (
    department_id INT PRIMARY KEY,
    department_name VARCHAR(50),
    department_address VARCHAR(100),
    department_phone VARCHAR(15)
);

4. Beyond 3NF: BCNF and 4NF

While 3NF suffices for most use cases, advanced normalization may be required for complex schemas:

  • BCNF (Boyce-Codd Normal Form): Eliminates dependencies on non-superkey attributes in composite keys.
  • 4NF (Fourth Normal Form): Addresses multi-valued dependencies where multiple independent values relate to a single key.

When to Denormalize

While normalization improves data integrity, over-normalization can degrade performance. Common denormalization scenarios include:

  1. Frequent Joins: Precompute aggregated values (e.g., total_sales).
  2. Read-Heavy Workloads: Duplicate data in reporting tables.
  3. High Transaction Volumes: Simplify write operations by reducing joins.

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.