For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.
This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.
🧑💻 Target audience:
DBA
Database developers
Students
We will use MySQL as the demonstration database.
This article explores how MySQL calculates table statistics, which are crucial for the cost-based optimizer to generate efficient execution plans. Inaccurate statistics can lead to suboptimal or even incorrect execution plans.
1. Non-Persistent vs. Persistent Statistics
1.1 Non-Persistent Statistics
Statistics are not saved to disk and are recalculated frequently in real-time.
Every table access triggers a recalculation of its statistics.
For frequent queries on large tables, recalculating statistics each time can be resource-intensive.
1.2 Persistent Statistics
Saves a table’s statistics at a specific moment to disk.
Avoids recalculating statistics with each query.
If a table isn’t updated frequently or hasn’t reached MySQL’s recalculation threshold, statistics are retrieved directly from disk.
Statistics remain available even after a MySQL service restart.
Can be configured globally or per table.
2. Controlling Statistics Calculation
MySQL Server controls whether to automatically calculate statistics distribution and whether to use persistent or non-persistent storage.
innodb_stats_auto_recalc: Automatically recalculates persistent statistics when data changes exceed 10% (default: enabled).
When a table’s data changes by more than 10%, MySQL checks if the last update was over 10 seconds ago. If not, the table is queued for later recalculation. If so, recalculation happens immediately, and the timestamp is updated. This timeout is hardcoded but can be adjusted in some MySQL variants like Percona.
2.2 Other Parameters
innodb_stats_include_delete_marked: Determines if deleted rows are included in statistics updates (default: disabled).
innodb_stats_persistent_sample_pages: Number of random pages sampled for index distribution updates (default: 20).
innodb_stats_transient_sample_pages: Number of sample pages for non-persistent statistics (default: 8).
innodb_stats_on_metadata: Automatically updates statistics when accessing certain metadata (default: disabled).
SQLFlash is your AI-powered SQL Optimization Partner.
Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.