MySQL LIKE Optimization: 100x Faster Queries!

Database performance optimization often hinges on effective indexing strategies, but even indexed LIKE queries can bottleneck under high data volumes. This case study demonstrates how SQLFlash transforms inefficient LIKE prefix queries into high-performance range scans, achieving 100x speed improvements through index-aware SQL rewriting. We dive into B+ tree mechanics, execution plan analysis, and practical optimization patterns for AI-driven database scaling.
|
|
|
|
|
|
The baseline LIKE query hits indexes but suffers from row-by-row validation:
|
|
We tried to rewrite the SQL by SQLFlash.
Rewrite using range conditions to leverage index ordering:
|
|
Metric | Original Query | Optimized Query | Improvement |
---|---|---|---|
Execution Time | 1.1s | 0.04s | 96% |
Rows Scanned | 4.98M | 18.8K | 99.5% |
Execution Plan Type | ALL | Range | - |
Test Environment: MySQL 8.0.18, 8GB InnoDB Buffer Pool
Based on the explanation provided by SQLFlash, in this case, the LIKE prefix query is optimized by rewriting username LIKE 'user_0123%'
as username >= 'user_0123' AND username < 'user_0124'
. This optimization can effectively leverage the ordered characteristics of the character index and reduce full table scans.
LIKE: type: range
+ Row-by-row verification of LIKE conditions
ββRange: Directly targets the precise interval with >=x AND <y
|
|
The equivalence between the two query patterns is established on the dual guarantees of index orderliness and string dictionary order rules:
The B+ tree index of the username
field strictly adheres to lexicographical order. For fixed-format data with the user_
prefix followed by 8 digits, its index sorting is mathematically equivalent to numerical sorting of the numeric portion. For example:
user_00000001 < user_00000002 < ... < user_01239999 < user_01240000
When using LIKE 'user_0123%'
, it is effectively equivalent to:
username >= 'user_0123' AND username < 'user_0124'
This is because the maximum possible value for 'user_0123%'
is 'user_01239999...'
(assuming fixed-length padding), while the next lexicographical starting point is 'user_0124'
. This rewrite achieves precise interval closure through string comparison operators, maintaining mathematical equivalence to the LIKE pattern.
Under common collations like utf8_general_ci
, the lexicographical order of digits 0-9
is identical to their numerical order. This ensures that string comparisons on the username
field are isomorphic to numerical comparisons of its numeric segments. As a result, range queries avoid any positional misalignment or missing records.
Original LIKE Query | Rewritten Range Query | |
---|---|---|
Index Usage Method | Prefix Scan + Sequential Verification of LIKE Conditions | Directly Locate the Continuous Index Range |
Filtering Stage | The storage engine layer returns all prefix - matching rows, and the Server layer filters them for the second time | The storage engine layer accurately filters through indexing, without secondary verification |
Scanning Granularity | Traverse all leaf nodes starting with user_0123 | Only read the nodes in the [user_0123, user_0124] range |
This difference ultimately leads to order - of - magnitude differences in the I/O overhead, CPU verification cost, and execution path length between the two. The characteristic that the result sets are completely identical makes the optimization reliable.
This case highlights how SQL semantics-aware rewriting unlocks hidden performance potential:
β’ Same index β 100x faster results
β’ Eliminates full scans through range optimization
β’ Maintains result accuracy through mathematical equivalence
Adopt this pattern to transform LIKE queries into high-performance operations.
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.
Join us and experience the power of SQLFlash today!.