インデックスを用いた、SQLのパフォーマンス改善の覚え書き

Contents

はじめに
環境
インデックスの作成
- 単一インデックスの作成
- 複合インデックスの作成
インデックス使用時の注意点
おわりに
参考

はじめに

本記事ではインデックスを用いたパフォーマンス改善の方法についてまとめます。

環境

DB は MySQL を使用します。
また検証にはこちらのデータセットを利用しました。

  mysql> select version();
  +-----------+
  | version() |
  +-----------+
  | 8.0.30    |
  +-----------+

インデックスの作成

単一インデックスの作成

絞り込み対象のカラムにインデックスを作成することで、インデックススキャンになり処理が速くなります。
ただし、where 句で10~20%程度まで絞り込めない場合はパフォーマンスが向上しない場合があります。

explain analyze select * from employees where first_name = 'Georgi';
-- -> Filter: (employees.first_name = \'Georgi\')  (cost=30179.05 rows=29947) (actual time=0.072..115.057 rows=253 loops=1)
--     -> Table scan on employees  (cost=30179.05 rows=299468) (actual time=0.066..89.027 rows=300024 loops=1)

-- インデックスを作成する
create index idx_employees_first_name on employees(first_name);
explain analyze select * from employees where first_name = 'Georgi';
-- -> Index lookup on employees using idx_employees_first_name (first_name=\'Georgi\')  (cost=88.55 rows=253) (actual time=0.567..1.421 rows=253 loops=1)

同様に group by や order by を用いる場合は、対象のカラムにインデックスを作成することで高速化が可能です。
ただし基本的に group by や order by には時間がかかるので、where 句で十分に絞り込みを行うことが重要になります。

explain analyze select first_name, count(*) from employees group by first_name;
-- -> Group aggregate: count(0)  (cost=60125.85 rows=299468) (actual time=1.588..114.008 rows=1275 loops=1)
--     -> Covering index scan on employees using idx_employees_first_name  (cost=30179.05 rows=299468) (actual time=1.259..75.920 rows=300024 loops=1)

複合インデックスの作成

複数のカラムでの絞り込みを行うクエリを高速化する際は、複合インデックスの使用を検討します。
ただし、絞り込み条件に OR を使用する場合は、インデックスを使用することができません。

create index idx_employees_first_name_birth_date on employees(first_name, birth_date);

-- andの場合は複合インデックスを使用できる
explain analyze select * from employees where first_name = 'Georgi' and birth_date like '1953-%';
-- -> Filter: (employees.birth_date like \'1953-%\')  (cost=66.06 rows=28) (actual time=0.659..1.674 rows=19 loops=1)
--     -> Index lookup on employees using idx_employees_first_name (first_name=\'Georgi\')  (cost=66.06 rows=253) (actual time=0.653..1.550 rows=253 loops=1)

-- orの場合は複合インデックスを使用できない
explain analyze select * from employees where first_name = 'Georgi' or birth_date like '1953-%';
-- -> Filter: ((employees.first_name = \'Georgi\') or (employees.birth_date like \'1953-%\'))  (cost=30179.05 rows=33476) (actual time=0.083..148.895 rows=23091 loops=1)
--     -> Table scan on employees  (cost=30179.05 rows=299468) (actual time=0.077..83.083 rows=300024 loops=1)

インデックス使用時の注意点

インデックスはカラムの値に対して作成される。そのため以下のような場合はフルスキャンが行われます。

-- where句の絞り込みに関数を使用した場合
explain analyze select * from employees where upper(first_name) = 'GEORGI';
-- -> Filter: (upper(employees.first_name) = \'GEORGI\')  (cost=30179.05 rows=299468) (actual time=0.101..117.035 rows=253 loops=1)
--     -> Table scan on employees  (cost=30179.05 rows=299468) (actual time=0.091..82.334 rows=300024 loops=1)

like 句を用いたクエリの場合、前方一致の場合はインデックスを使用することができるが、後方一致にはインデックスが使用されません。
そのため後方一致での検索が必要な場合は、逆の値を格納したカラムを別途作成してそのカラムに対して前方一致で検索する必要があります。

-- 前方一致
explain analyze select * from employees where first_name like 'G%';
-- -> Index range scan on employees using idx_employees_first_name_birth_date over (\'G\' <= first_name <= \'G????????????????????????????????????????????????????\'), with index condition: (employees.first_name like \'G%\')  (cost=13919.66 rows=30932) (actual time=0.068..56.282 rows=16489 loops=1)

-- 後方一致
explain analyze select * from employees where first_name like '%i';
-- -> Filter: (employees.first_name like \'%i\')  (cost=30179.05 rows=33271) (actual time=0.098..116.484 rows=30787 loops=1)
--    -> Table scan on employees  (cost=30179.05 rows=299468) (actual time=0.092..80.317 rows=300024 loops=1)

おわりに

本記事ではインデックスを用いたパフォーマンス改善の方法についてまとめました。
この記事がどなたかの参考になれば幸いです。

参考

GitHub – datacharmer/test_db: A sample MySQL database with an integrated test suite, used to test your applications and database servers