Lesson 17 of the SQL Optimization Course: Understanding Hash Tables | SQLFlash

Introduction

For relational databases, the design of tables and SQL is written are particularly crucial. It wouldn’t be an exaggeration to say that they account for 90% of performance. So this time, specifically targeting these two major knowledge areas, we’ll conduct a detailed analysis for you, peeling back the layers.

This Series uses plain and understandable language and selects a large number of examples to elaborate on the subtleties for you.

🧑‍💻 Target audience:

  • DBA
  • Database developers
  • Students

We will use MySQL as the demonstration database.


1. Arrays

Arrays are a fundamental data structure, storing elements sequentially and allowing access via an index. They are present in various programming languages and databases, including MySQL.

One-dimensional array: arr1 = [10, 20, 30, 40, 50, 60, 70, 80, 90, 100]

IndexValueAccess Method
010arr1[0] = 10
120arr1[1] = 20
230arr1[2] = 30
340arr1[3] = 40
450arr1[4] = 50
560arr1[5] = 60
670arr1[6] = 70
780arr1[7] = 80
890arr1[8] = 90
9100arr1[9] = 100

Here’s an example of a one-dimensional integer array in MySQL:

1
2
3
4
5
6
7
mysql> SELECT @a AS "array", JSON_LENGTH(@a) AS "array_size";
+-------------------------------------------+------------+
| array                                     | array_size |
+-------------------------------------------+------------+
| [10, 20, 30, 40, 50, 60, 70, 80, 90, 100] | 10         |
+-------------------------------------------+------------+
1 row in set (0.00 sec)

Arrays can also be multidimensional. Below is an example of a two-dimensional string array in MySQL:

Advantages of Arrays:

  • Fast access to elements via index with a time complexity of O(1).

Disadvantages of Arrays:

  • Insertion and deletion operations can be slow as they may require shifting elements.
  • Arrays require a contiguous block of memory, leading to potential space waste during resizing.

2. Dictionaries

Dictionaries are similar to arrays but use arbitrary strings as keys instead of numeric indices.

In MySQL, dictionaries can be represented using JSON:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
mysql> SET @a = '{"10":"mysql","20":"db2","30":"oracle","40":"mongodb"}';
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT JSON_KEYS(@a);
+--------------------------+
| json_keys(@a)            |
+--------------------------+
| ["10", "20", "30", "40"] |
+--------------------------+
1 row in set (0.00 sec)

mysql> SET @x1 = JSON_EXTRACT(@a, '$."10"');
Query OK, 0 rows affected (0.01 sec)

mysql> SET @x2 = JSON_EXTRACT(@a, '$."20"');
Query OK, 0 rows affected (0.00 sec)

mysql> SET @x3 = JSON_EXTRACT(@a, '$."30"');
Query OK, 0 rows affected (0.00 sec)

mysql> SET @x4 = JSON_EXTRACT(@a, '$."40"');
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT @x1 AS "dict['10']", @x2 AS "dict['20']", @x3 AS "dict['30']", @x4 AS "dict['40']";
+------------+------------+------------+------------+
| dict['10'] | dict['20'] | dict['30'] | dict['40'] |
+------------+------------+------------+------------+
| "mysql"    | "db2"      | "oracle"   | "mongodb"  |
+------------+------------+------------+------------+
1 row in set (0.00 sec)

3. Linked Lists

A linked list is a linear data structure where each element (node) contains a value and a pointer to the next element. Unlike arrays, linked lists do not require contiguous memory allocation.

Advantages of Linked Lists:

  • They do not require contiguous memory, making them flexible for memory usage.
  • Insertion and deletion operations are fast, with a time complexity of O(1).

Disadvantages of Linked Lists:

  • Accessing elements is slow, requiring sequential traversal from the beginning, with a time complexity of O(N).

4. Hash Tables

Hash tables are data structures that allow direct access to values using a key. They typically use an array to store values, with the index determined by a hash function applied to the key.

Issues with Hash Tables

  1. Value Storage: Storing values directly in the array can be inefficient, especially if there are many values or if the values are large.
  2. Write Efficiency: Writing values directly into the array can be inefficient.
  3. Hash Collisions: Different keys may produce the same index, leading to collisions.

4.1 Solutions to Hash Table Issues

To address the first two issues, hash tables can combine arrays with linked lists, a method known as “chaining.” In this approach, the array stores pointers to linked lists that hold the values. This leverages the fast read performance of arrays and the fast write performance of linked lists.

4.2 Hash Function Considerations

Selecting a good hash function is crucial for minimizing collisions and ensuring efficient hash table performance. Considerations include:

  • Data Distribution: The hash function should distribute keys uniformly across the hash table to minimize collisions.
  • Efficiency: The hash function should be fast to compute, ideally with a time complexity close to O(1).

A poor hash function can lead to frequent collisions and degrade hash table performance. In such cases, alternative data structures like AVL trees can be used when the linked list grows too long.

Advantages of Hash Tables:

  • Fast insertions and lookups with an average time complexity of O(1), making them efficient for small datasets.

Disadvantages of Hash Tables:

  • Requires pre-estimating the data volume to prevent the hash table from becoming too large.
  • Finding a suitable hash function to minimize collisions can be challenging.

5. MySQL Hash Indexes

MySQL’s hash indexes are built on the hash table structure. The indexed field serves as the key, and the hash function computes the index to point to the corresponding row. Understanding hash tables is essential for grasping MySQL’s adaptive hash indexes and hash joins.

By understanding hash tables and hash indexes, you can better optimize MySQL queries and improve database performance.

👋 See you in the next lesson.

What is SQLFlash?

SQLFlash is your AI-powered SQL Optimization Partner.

Based on AI models, we accurately identify SQL performance bottlenecks and optimize query performance, freeing you from the cumbersome SQL tuning process so you can fully focus on developing and implementing business logic.

How to use SQLFlash in a database?

Ready to elevate your SQL performance?

Join us and experience the power of SQLFlash today!.