Introduction
In the era of big data, real-time data processing performance optimization has become the core challenge in system architecture design. This report specifically addresses the performance bottlenecks of window functions in high-frequency TopN query scenarios within monitoring systems. Through empirical testing with tens of millions of alert records, we reveal three critical pain points in traditional full-table sorting approaches: I/O pressure from full data scans, memory contention caused by massive sorting buffers, and computational resource waste from complex execution plans.
Leveraging MySQL 8.0’s window function features and indexing mechanisms, we propose a “pre-fetch and post-processing” optimization paradigm. Through SQL logic rewriting and coordinated index strategies, we reduced Top3 query response times from 8 seconds to millisecond-level (10ms), achieving two orders of magnitude performance improvement. This report not only provides directly reusable optimization patterns but also establishes a boundary condition determination model for TopN query optimization, offering methodological guidance for SQL tuning across different business scenarios. All experimental data were validated in standardized testing environments to ensure optimization solutions’ universality and reliability.
I. Business Scenario Simulation
Requirements Background
A monitoring system requires real-time display of “latest 3 alert records with global sequence numbers”. Original window function implementation faced performance bottlenecks, now validated through SQL rewriting.
Data Characteristics
- Warning table with timestamp field (10M+ records)
- Requires absolute accuracy (no duplicate skipping)
- High-frequency queries (dozens per minute)
II. Test Environment Setup
Table Creation Statement
1
2
3
4
5
6
| -- Original table (no indexes)
CREATE TABLE t (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
alert_time DATETIME(6) NOT NULL COMMENT 'Alert Time',
content VARCHAR(255) NOT NULL
) ENGINE=InnoDB;
|
Data Generation Method
1
2
3
4
5
6
7
8
9
| -- Generate 5M test records
INSERT INTO t (alert_time, content)
SELECT
NOW(6) - INTERVAL FLOOR(RAND()*86400 * 365) SECOND,
CONCAT('Alert-', UUID())
FROM
information_schema.columns c1,
information_schema.columns c2
LIMIT 5000000;
|
III. SQL Optimization
Original Test Query
1
2
3
4
5
6
7
8
9
| -- Original SQL (full table sort)
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY alert_time DESC) AS rownumber,
id,
alert_time,
content
FROM t
) dt WHERE rownumber <= 3;
|
Optimized Implementation with SQLFlash

1
2
3
4
5
6
7
8
9
10
11
12
13
14
| WITH t_topN AS (
SELECT id, alert_time, content
FROM t
ORDER BY alert_time DESC
LIMIT 3
)
SELECT * FROM (
SELECT
ROW_NUMBER() OVER (ORDER BY alert_time DESC) AS rownumber,
id,
alert_time,
content
FROM t_topN
) dt;
|
Add supporting index:
1
| ALTER TABLE t ADD INDEX idx_at (alert_time);
|
View Full Report
Testing Methodology
- Hot cache tests (3 executions averaged)
- Monitoring metrics:
- Execution time (seconds)
- Scanned rows (Handler_read_next)
Metric | Original SQL | Optimized SQL | Optimized SQL (Indexed) |
---|
Cached execution time | 8s | 1.4s | 0.01s |
Scanned rows | 4,975,244 | 4,975,244 | 3 |
According to SQLFlash’s analysis, the main improvements from this optimization are:
- Using CTE with LIMIT 3 to conduct significant result set contraction before applying ROW_NUMBER() function
- Compared to original SQL, this avoids window function sorting and numbering on full 1,000,000 rows, reducing CPU and memory usage
- Final top 3 results can be determined immediately after data retrieval completion, reducing overall computational overhead and achieving performance improvement
V. Execution Plan Analysis
Original Query Plan
1
2
3
4
5
6
7
8
9
10
| EXPLAIN FORMAT=TREE
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY alert_time DESC) AS rownumber, id, alert_time, content
FROM t
) dt WHERE rownumber <= 3;
-> Window aggregate
-> Sort: t.alert_time DESC (cost=503747.65 rows=4975244)
-> Table scan on t
└─➔ Full table scan with filesort
|
Key Issues:
- Missing index causing full table scan
- Built a 5,000,000-row sorting buffer in memory
- Generated temporary tables to store intermediate results
Optimized Query Plan
1
2
3
4
5
6
7
8
9
10
| EXPLAIN FORMAT=TREE
WITH t_topN AS (...)
SELECT * FROM (...);
-> Window aggregate
-> Sort: t_topN.alert_time DESC
-> Table scan on t_topN (cost=2.76 rows=3)
└─➔ Materialized CTE (3 rows)
-> Limit: 3 row(s)
-> Index scan on t using idx_at(reverse) (cost=0.00 rows=3)
|
Optimization Highlights:
- Leverage index to directly locate latest 3 records
- Window function calculations limited to 3 rows
- Avoid temporary table generation completely
VI. Optimization Boundaries
Valid Optimization Case
1
2
3
4
5
6
7
8
9
10
11
12
13
14
| -- Original
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY score DESC) AS rn
FROM students
) t WHERE rn <= 5;
-- Optimized
WITH top_students AS (
SELECT score FROM students ORDER BY score DESC LIMIT 5
)
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY score DESC) AS rn
FROM top_students
) t;
|
Invalid Optimization Cases
- Dynamic N-value:
1
2
3
4
5
| -- Fails: N from subquery
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY sales DESC) AS rn
FROM products
) t WHERE rn <= (SELECT top_num FROM config);
|
- Non-consecutive range:
1
2
3
4
5
| -- Fails: BETWEEN clause
SELECT * FROM (
SELECT ROW_NUMBER() OVER (ORDER BY create_time) AS rn
FROM orders
) t WHERE rn BETWEEN 100 AND 103;
|
VII. Technical Principles
​Index Skip Scan​
- Leverages B+Tree ordered index to directly locate TOP N records
​Early Data Processing​
- Completes core filtering during data retrieval phase
​
Memory Optimization​
- Reduces sorting operations from 10M to 3 rows
​
Execution Plan Stability​
- Complex O(n log n) plan simplified to O(1) complexity
VIII. Conclusion
SQLFlash optimizes TopN queries by:
- Limiting data early via CTE/LIMIT in initial query phase
- Eliminating full-table scans through index utilization
- Reducing window function computation to 3 rows
- Maintaining query accuracy while achieving 800x performance gains
Recommended reading
What is SQLFlash?
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
How to use SQLFlash in a database?