返回文章列表

【MySQL】高性能的索引策略-前缀索引

2024年8月16日
MySQL

正确地创建和使用索引是实现高性能查询的基础。高效地选择和使用索引有很多种方式,下面将介绍如何高效的使用索引。

独立的列

我们经常会遇到无法使用已有的索引。如果查询中的列不是独立的,则MySQL就不会使用索引。独立的列是指索引列不能是表达式的一部分,也不能是函数的参数。 例如,下面这个查询无法使用user_id列的索引:

SELECT user_id FROM test.blog WHERE user_id + 1 = 5;

可以清楚的看到where的表达式等价于user_id = 4,但是MySQL无法自动解析这个方程式。我们应该养成简化WHERE条件的习惯,始终将索引列单独放在比较符号的一侧。

前缀索引和索引选择性

有时候需要索引很长的字符列,这会让索引变得大且慢。一个策略就是前面提到的模拟哈希索引,但有时候这样还不够,现在来介绍一下前缀索引(对列的开始部分字符进行索引);这样可以大大节约索引空间,从而提高索引效率,但这样也会降低索引的选择性(不重复的索引值和数据表的记录数的比值),值越高查询效率越高。唯一索引的选择性为1,这是最好的索引选择性,性能也是最好的。一般情况下某个列前缀的选择性也是足够高的,足以满足查询性能。对于BLOB、TExT或者很长的VARCHAR类型的列,必须使用前缀索引,因为MySQL不允许索引这些列的完整长度。索引选择合适的前缀很够必要。下面来做个例子: 偷懒,直接使用了MySQL自带的sakila.city数据表,下面对这张表进行一些操作;

CREATE TABLE sakila.city_demo(city VARCHAR(50) NO NULL);
INSERT INTO sakila.city_demo(city) SELECT city FROM sakila.city;
INSERT INTO sakila.city_demo(city) SELECT city FROM sakila.city_demo;  //(执行5次);
UPDATE sakila.city_demo SET city = (SELECT city FROM sakila.city ORDER BY RAND() LIMIT 1);

现在有个这些测试数据,首先,我们找到最常见的城市列表:

mysql> SELECT COUNT(*) AS cnt, city FROM sakila.city_demo GROUP BY city ORDER BY cnt DESC LIMIT 10;
+-----+----------------------------+
| cnt | city                       |
+-----+----------------------------+
|  54 | London                     |
|  49 | Luzinia                    |
|  48 | Akron                      |
|  48 | Elista                     |
|  47 | Patiala                    |
|  46 | Pingxiang                  |
|  45 | San Felipe de Puerto Plata |
|  45 | Alessandria                |
|  44 | Varanasi (Benares)         |
|  44 | Nam Dinh                   |
+-----+----------------------------+

现在开始先从3个前缀字母开始:

mysql> select count(*) as cnt, left(city, 3) as pref from sakila.city_demo group by pref order by cnt desc limit 10;
+-----+------+
| cnt | pref |
+-----+------+
| 462 | San  |
| 196 | Cha  |
| 177 | Sal  |
| 168 | Sou  |
| 141 | Bat  |
| 139 | Man  |
| 138 | al-  |
| 136 | Tan  |
| 130 | Hal  |
| 127 | Sha  |
+-----+------+

可以看到每个前缀比原来的城市出现的次数更多,然后我们增加前缀长度,直到这个前缀的选择性接近完整列的选择性。经过实验后发现前缀长度为7是比较合适:

mysql> select count(*) as cnt, left(city, 7) as pref from sakila.city_demo group by pref order by cnt desc limit 10;
+-----+--------+
| cnt | pref   |
+-----+--------+
|  84 | Santa  |
|  81 | San Fe |
|  63 | Valle  |
|  54 | London |
|  53 | Santia |
|  49 | Luzini |
|  48 | Akron  |
|  48 | Elista |
|  47 | Patial |
|  46 | Pingxi |
+-----+--------+

计算合适的前缀的长度还有另外的一个办法就是计算完整列的选择性,并使前缀的选择性接近于完整列的选择性。下面显示如何计算完整列的选择性:

mysql> select count(distinct city)/count(*) from sakila.city_demo;
+-------------------------------+
| count(distinct city)/count(*) |
+-------------------------------+
|                        0.0312 |
+-------------------------------+

然后可以在一个查询中针对不同前缀长度进行计算,下面显示如何计算不同前缀长度的选择性:

mysql> SELECT COUNT(DISTINCT LEFT(city, 3))/COUNT(*) AS pref3,
    -> COUNT(DISTINCT LEFT(city, 4))/COUNT(*) AS pref4,
    -> COUNT(DISTINCT LEFT(city, 5))/COUNT(*) AS pref5,
    -> COUNT(DISTINCT LEFT(city, 6))/COUNT(*) AS pref6,
    -> COUNT(DISTINCT LEFT(city, 7))/COUNT(*) AS pref7 FROM sakila.city_demo;
+--------+--------+--------+--------+--------+
| pref3  | pref4  | pref5  | pref6  | pref7  |
+--------+--------+--------+--------+--------+
| 0.0239 | 0.0293 | 0.0305 | 0.0309 | 0.0310 |
+--------+--------+--------+--------+--------+

可以看出前缀达到7的时候选择性提升的幅度已经很小了,而且和完整列的选择性也很接近。 找打合适的前缀长度,下面来看如何创建前缀索引和创建前后的差别:

  • 未创建前缀索引:
mysql> EXPLAIN SELECT * FROM sakila.city_demo WHERE city = 'London';
+----+-------------+-----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref  | rows  | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
|  1 | SIMPLE      | city_demo | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 19241 |    10.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
  • 创建索引
mysql> ALTER TABLE sakila.city_demo ADD KEY(city(7));
Query OK, 0 rows affected (0.19 sec)
Records: 0  Duplicates: 0  Warnings: 0
  • 创建索引后分析
mysql> EXPLAIN SELECT * FROM sakila.city_demo WHERE city = 'London';
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
| id | select_type | table     | partitions | type | possible_keys | key  | key_len | ref   | rows | filtered | Extra       |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | city_demo | NULL       | ref  | city          | city | 23      | const |   54 |   100.00 | Using where |
+----+-------------+-----------+------------+------+---------------+------+---------+-------+------+----------+-------------+

注:前缀索引是一种能使索引更小,更快的有效办法,但另一方面也有其缺点:MySQL无法使用前缀索引做ORDER BY 和GROUP BY 。