Lesson 12 of the SQL Optimization Course: How MySQL Calculates Table Statistics? | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.

This article explores how MySQL calculates table statistics, which are crucial for the cost-based optimizer to generate efficient execution plans. Inaccurate statistics can lead to suboptimal or even incorrect execution plans.

1. Non-Persistent vs. Persistent Statistics

1.1 Non-Persistent Statistics

  • Statistics are not saved to disk and are recalculated frequently in real-time.
  • Every table access triggers a recalculation of its statistics.

For frequent queries on large tables, recalculating statistics each time can be resource-intensive.

1.2 Persistent Statistics

  • Saves a table’s statistics at a specific moment to disk.
  • Avoids recalculating statistics with each query.
  • If a table isn’t updated frequently or hasn’t reached MySQL’s recalculation threshold, statistics are retrieved directly from disk.
  • Statistics remain available even after a MySQL service restart.
  • Can be configured globally or per table.

2. Controlling Statistics Calculation

MySQL Server controls whether to automatically calculate statistics distribution and whether to use persistent or non-persistent storage.

2.1 Key Parameters

  • innodb_stats_persistent: Enables/disables persistent statistics (default: enabled).
  • innodb_stats_auto_recalc: Automatically recalculates persistent statistics when data changes exceed 10% (default: enabled).

When a table’s data changes by more than 10%, MySQL checks if the last update was over 10 seconds ago. If not, the table is queued for later recalculation. If so, recalculation happens immediately, and the timestamp is updated. This timeout is hardcoded but can be adjusted in some MySQL variants like Percona.

2.2 Other Parameters

  • innodb_stats_include_delete_marked: Determines if deleted rows are included in statistics updates (default: disabled).
  • innodb_stats_persistent_sample_pages: Number of random pages sampled for index distribution updates (default: 20).
  • innodb_stats_transient_sample_pages: Number of sample pages for non-persistent statistics (default: 8).
  • innodb_stats_on_metadata: Automatically updates statistics when accessing certain metadata (default: disabled).

3. When to Adjust Sample Pages

Consider adjusting innodb_stats_persistent_sample_pages:

  1. If query plans are inaccurate.
  2. If ANALYZE TABLE becomes too slow.

4. Automatic Statistics Updates

Non-persistent statistics update automatically in scenarios like:

  1. First table access.
  2. After one-sixteenth of the table has been updated since last statistics calculation.
  3. MySQL client default option --auto-rehash is enabled.

5. Table Statistics Storage

Statistics are stored in metadata tables:

  • mysql.innodb_table_stats: Contains table-level statistics.
  • mysql.innodb_index_stats: Contains index-level statistics.

Example of table statistics for ytt_sample_persist:

1
2
3
4
5
6
7
8
mysql> SELECT n_rows, clustered_index_size, sum_of_other_index_sizes 
       FROM innodb_table_stats 
       WHERE database_name = 'ytt' AND table_name = 'ytt_sample_persist';
+--------+----------------------+--------------------------+
| n_rows | clustered_index_size | sum_of_other_index_sizes |
+--------+----------------------+--------------------------+
| 356960 |                15162 |                     4113 |
+--------+----------------------+--------------------------+

Real data count:

1
2
3
4
5
6
mysql> SELECT COUNT(*) FROM ytt_sample_persist;
+----------+
| COUNT(*) |
+----------+
|   406644 |
+----------+

Updating statistics manually:

1
2
3
4
5
6
mysql> ANALYZE TABLE ytt_sample_persist;
+------------------------+---------+----------+----------+
| Table                  | Op      | Msg_type | Msg_text |
+------------------------+---------+----------+----------+
| ytt.ytt_sample_persist | analyze | Status   | OK       |
+------------------------+---------+----------+----------+

Updated statistics:

1
2
3
4
5
6
7
8
mysql> SELECT n_rows, clustered_index_size, sum_of_other_index_sizes 
       FROM innodb_table_stats 
       WHERE database_name = 'ytt' AND table_name = 'ytt_sample_persist';
+--------+----------------------+--------------------------+
| n_rows | clustered_index_size | sum_of_other_index_sizes |
+--------+----------------------+--------------------------+
| 387202 |                16380 |                     4562 |
+--------+----------------------+--------------------------+

6. Index Statistics Example

Index statistics for ytt_sample_persist:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
mysql> SELECT index_name, stat_name, stat_value, sample_size, stat_description 
       FROM innodb_index_stats 
       WHERE database_name = 'ytt' AND table_name = 'ytt_sample_persist' AND index_name = 'PRIMARY';
+------------+--------------+------------+-------------+-----------------------------------+
| index_name | stat_name    | stat_value | sample_size | stat_description                  |
+------------+--------------+------------+-------------+-----------------------------------+
| PRIMARY    | n_diff_pfx01 |      14137 |          20 | i1                                |
| PRIMARY    | n_diff_pfx02 |      75398 |          20 | i1,i2                             |
| PRIMARY    | n_diff_pfx03 |     387202 |          20 | i1,i2,i3                          |
| PRIMARY    | n_leaf_pages |      15708 |        NULL | Number of leaf pages in the index |
| PRIMARY    | size         |      16380 |        NULL | Number of pages in the index      |
+------------+--------------+------------+-------------+-----------------------------------+

Understanding table statistics helps optimize SQL queries. If execution plans are suboptimal, outdated statistics might be the cause, requiring manual updates.

Abstract

What MySQL technical topics would you like to learn more about? Let us know in the comments!

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.