Lesson 23 of the SQL Optimization Course: Index Cardinality and Selectivity | SQLFlash2

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


Understanding Index Cardinality

Definition:
Index cardinality refers to the estimated number of unique values in an index. High cardinality means unique values dominate (e.g., primary keys), while low cardinality indicates many duplicate values (e.g., status flags).

MySQL relies on cardinality to determine optimal query execution plans. Outdated or inaccurate cardinality values can lead to suboptimal performance, such as full table scans instead of index usage.


Factors Influencing Cardinality

  1. Data Distribution:

    • Skewed data (e.g., gender flags) results in lower cardinality.
    • Evenly distributed data (e.g., timestamps) improves cardinality.
  2. Update Frequency:

    • MySQL automatically updates cardinality when 1/16 of the table’s pages change.
    • Manual updates via ANALYZE TABLE or SHOW INDEX are sometimes necessary.
  3. Composite Indexes:

    • Calculated hierarchically (e.g., (col1, col2) computes cardinality for col1, then (col1,col2), etc.).
1
2
3
4
5
-- View index cardinality via INFORMATION_SCHEMA  
SELECT table_schema, table_name, index_name, cardinality  
FROM information_schema.statistics  
WHERE table_schema = 'your_db' AND table_name = 'your_table'  
ORDER BY cardinality DESC;

Example Output

Case Study: Query Optimization

Scenario 1: Poor Selectivity

Problem: Filtering on a low-cardinality column (status) leads to full table scans.

1
2
SELECT * FROM orders WHERE status = 'completed';  
-- Execution plan shows "type: ALL" (full table scan)

Solution: Avoid indexing low-cardinality columns unless combined with high-cardinality fields.

Scenario 2: Range Queries

Problem: Queries on date ranges may ignore indexes if cardinality is low.

1
2
SELECT COUNT(*) FROM logs WHERE event_date BETWEEN '2023-01-01' AND '2023-01-31';
-- No index used due to estimated high row access

Workaround: Use covering indexes or partitioning for large date ranges.

Best Practices

TABLE_SCHEMATABLE_NAMEINDEX_NAMECARDINALITY
your_dbusersPRIMARY10000
your_dbusersidx_email5000
your_dbordersidx_status50
  1. Prioritize High-Cardinality Columns:

    • Use them as leading columns in composite indexes.
    • Example: CREATE INDEX idx_user_activity ON user_logs (user_id, action_type).
  2. Avoid Over-Indexing:

    • Low-cardinality indexes waste storage and slow down writes.
  3. ​Monitor Cardinality Trends:

    • Regularly check for skewed data distribution or outdated statistics.

Measuring Cardinality

SQL Commands

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.