Introduction
In modern data-driven applications, efficient data querying stands as one of the core challenges in ensuring system performance. This is especially true when dealing with massive amounts of data: even seemingly minor differences in SQL syntax can lead to exponentially increased resource consumption or even trigger cascading system bottlenecks. Take common grouped aggregation queries as an example—developers often rely on the HAVING clause to filter aggregated results. However, when filtering conditions do not depend on aggregate functions (e.g., filtering directly based on grouped columns), this approach may harbor significant performance pitfalls. Through performance comparison tests and execution plan analysis, this article fully demonstrates the optimization benefits of HAVING condition pushdown.
I. Test Data Generation (Generating 5 Million Records in MySQL)
1. Create a Test Table
1
2
3
4
5
6
7
| -- Create the employees table
CREATE TABLE employees (
id INT AUTO_INCREMENT PRIMARY KEY,
department_id INT NOT NULL, -- Department ID (randomly distributed between 1-20)
name VARCHAR(8) NOT NULL, -- Employee Name (random 8-character string)
age INT NOT NULL -- Employee Age (random value between 20-60)
) ENGINE=InnoDB;
|
2. Bulk Insert 5 Million Records
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
| -- Enable recursive CTE and adjust the generation limit
SET SESSION cte_max_recursion_depth = 1000000;
-- Generate 5 million records
INSERT INTO employees (department_id, name, age)
WITH RECURSIVE seq AS (
SELECT 0 AS n
UNION ALL
SELECT n + 1 FROM seq WHERE n < 4999999
)
SELECT
FLOOR(RAND(n) * 20) + 1 AS department_id,
SUBSTRING(MD5(RAND(n)), 1, 8) AS name,
FLOOR(RAND(n + 100000) * 41) + 20 AS age
FROM seq;
|
1
2
| -- Create an index on department_id
CREATE INDEX idx_department ON employees(department_id);
|
1. Original SQL Query
1
2
3
4
| SELECT department_id, COUNT(*) AS employee_count
FROM employees
GROUP BY department_id
HAVING department_id > 5;
|
2. Optimized SQL Query
Rewriting Strategy with SQLFlash Insights:

The rewritten SQL is as follows:
1
2
3
| SELECT department_id, COUNT(*) AS employee_count
FROM employees
WHERE department_id > 5 GROUP BY department_id;
|
View detailed report: https://sqlflash.ai/app/sqlResult?shareid=dfe1a9dc-b583-4f1c-aa71-9d2ca41cd40b
1. Test Environment
- Database Version: MySQL 8.0.18
- Hardware Configuration: 4-core CPU / 8GB RAM / SSD storage
- Dataset: 5 million records across 20 departments (approximately 250,000 records per department)
2. Test Results
Metric | Traditional Method (HAVING) | Optimized Method (WHERE) | Performance Improvement |
---|
Execution Time | 0.58 seconds | 0.48 seconds | 17% |
Rows Scanned | 5 million rows | 3.75 million rows | 25% |
1. SQLFlash Analysis
Compared with the original SQL, the rewritten SQL is logically clearer. By moving the filtering to the WHERE clause, it avoids grouping and aggregating operations on the large number of rows where department_id ≤ 5, thereby reducing unnecessary processing and improving overall computational efficiency. Without changing the meaning of the result set, the new execution plan can filter out non-matching rows earlier, optimizing resource consumption and significantly reducing query overhead. Meanwhile, the structure of the new SQL is more concise and readable.
- Full table scan grouping: All 5 million rows of data participate in grouping calculations, generating grouping results for 20 departments.
- Redundant filtering operations: Departments with department_id ≤ 5 are filtered out only after grouping is completed, wasting computational resources.
3. Core Advantages of the Optimized Writing Style
- Index-accelerated filtering: Directly locating valid data (departments 6–20) via WHERE department_id > 5, with only 3.75 million rows entering the grouping phase.
- Reduced temporary table computation: MySQL creates temporary tables by default when using GROUP BY. Reducing the data volume lowers memory and disk overhead.
4. Execution Plan Comparison (Using EXPLAIN)
Original SQL Execution Plan:
1
2
3
4
5
| +----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+-------------+
| 1 | SIMPLE | employees | NULL | index | idx_department | idx_department | 4 | NULL | 4987392 | 100.00 | Using index |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+-------------+
|
Optimized SQL Execution Plan:
1
2
3
4
5
| +----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+--------------------------+
| 1 | SIMPLE | employees | NULL | range | idx_department | idx_department | 4 | NULL | 2493696 | 100.00 | Using where; Using index |
+----+-------------+-----------+------------+-------+----------------+----------------+---------+------+---------+----------+--------------------------+
|
V. Summary
Through HAVING condition pushing optimization combined with index acceleration, the following can be achieved:
- 17% performance improvement: Execution time decreased from 0.58 seconds to 0.48 seconds.
- Efficient resource utilization: Reduced temporary table memory and disk usage.
- Clear and maintainable code: Filter logic is front-loaded, lowering complexity for subsequent extensions.
Final recommendation: When handling large-data grouping in MySQL, prioritize moving non-aggregate conditions to WHERE and creating necessary indexes.
Recommended reading
What is SQLFlash?
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
How to use SQLFlash in a database?