postgresql 中的 like 查询优化方案

当时数量量比较庞大的时候，做模糊查询效率很慢，为了优化查询效率，尝试如下方法做效率对比

一、对比情况说明：

1、数据量100w条数据

2、执行sql

二、对比结果

								
									 explain analyze   SELECT 

									    c_patent, 

									    c_applyissno, 

									    d_applyissdate, 

									    d_applydate, 

									    c_patenttype_dimn, 

									    c_newlawstatus, 

									    c_abstract  

									 FROM 

									    public  .t_knowl_patent_zlxx_temp  

									 WHERE 

									    c_applicant   LIKE   '%本溪满族自治县连山关镇安平安养殖场%'  ;

1、未建索时执行计划：

"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3)

Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)

Rows Removed by Filter: 333333

Planning time: 0.272 ms

Execution time: 228.116 ms"

2、btree索引

建索引语句

1	CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public .t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);

执行计划

"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1)

Workers Planned: 2

Workers Launched: 2

-> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3)

Filter: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%'::text)

Rows Removed by Filter: 333333

Planning time: 0.116 ms

Execution time: 218.189 ms"

但是如果将查询sql稍微改动一下，把like查询中的前置%去掉是这样的

Index Scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows =92 width=1278) (actual time =0.292..0.292 rows =0 loops=1)

Index Cond: (((c_applicant)::text ~>=~ '本溪满族自治县连山关镇安平安养殖场' ::text) AND ((c_applicant)::text ~<~ '本溪满族自治县连山关镇安平安养殖圻' ::text))

Filter: ((c_applicant)::text ~~ '本溪满族自治县连山关镇安平安养殖场%' ::text)

Planning time : 0.710 ms

Execution time : 0.378 ms

3、gin索引

创建索引语句( postgresql 要求在9.6版本及以上)

1 2	create extension pg_trgm; CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public .t_knowl_patent_zlxx USING gin (c_applicant gin_trgm_ops);

执行计划

Bitmap Heap Scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows =91 width=1268) (actual time =0.649..0.649 rows =0 loops=1)

Recheck Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%' ::text)

-> Bitmap Index Scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows =91 width=0) (actual time =0.647..0.647 rows =0 loops=1)

Index Cond: ((c_applicant)::text ~~ '%本溪满族自治县连山关镇安平安养殖场%' ::text)

Planning time : 0.673 ms

Execution time : 0.740 ms

三、结论

btree索引可以让后置% "abc%"的模糊匹配走索引，gin + gp_trgm可以让前后置% "%abc%" 走索引。但是gin 索引也有弊端，以下情况可能导致无法命中：

搜索字段少于3个字符时，不会命中索引，这是gin自身机制导致。

当搜索字段过长时，比如email检索，可能也不会命中索引，造成原因暂时未知。

补充：PostgreSQL LIKE 查询效率提升实验

一、未做索引的查询效率

作为对比，先对未索引的查询做测试

								
									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author =   '曹志耘'  ; 

									                QUERY PLAN              

									 ----------------------------------------------------------------------------------------------------------------- 

									    Seq Scan   on   gallery_map (cost=0.00..7002.32   rows  =1025 width=621) (actual   time  =0.011..39.753   rows  =1031 loops=1) 

									    Filter: ((author)::text =   '曹志耘'  ::text) 

									    Rows   Removed   by   Filter: 71315 

									    Planning   time  : 0.194 ms 

									    Execution   time  : 39.879 ms 

									 (5   rows  ) 

									 Time  : 40.599 ms 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '曹志耘'  ; 

									                QUERY PLAN              

									 ----------------------------------------------------------------------------------------------------------------- 

									    Seq Scan   on   gallery_map (cost=0.00..7002.32   rows  =1025 width=621) (actual   time  =0.017..41.513   rows  =1031 loops=1) 

									    Filter: ((author)::text ~~   '曹志耘'  ::text) 

									    Rows   Removed   by   Filter: 71315 

									    Planning   time  : 0.188 ms 

									    Execution   time  : 41.669 ms 

									 (5   rows  ) 

									 Time  : 42.457 ms 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '曹志耘%'  ; 

									                QUERY PLAN              

									 ----------------------------------------------------------------------------------------------------------------- 

									    Seq Scan   on   gallery_map (cost=0.00..7002.32   rows  =1028 width=621) (actual   time  =0.017..41.492   rows  =1031 loops=1) 

									    Filter: ((author)::text ~~   '曹志耘%'  ::text) 

									    Rows   Removed   by   Filter: 71315 

									    Planning   time  : 0.307 ms 

									    Execution   time  : 41.633 ms 

									 (5   rows  ) 

									 Time  : 42.676 ms

很显然都会做全表扫描

二、创建btree索引

PostgreSQL默认索引是btree

								
									 CREATE   INDEX   ix_gallery_map_author   ON   gallery_map (author); 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author =   '曹志耘'  ;   

									                   QUERY PLAN                 

									 ------------------------------------------------------------------------------------------------------------------------------------- 

									    Bitmap Heap Scan   on   gallery_map (cost=36.36..2715.37   rows  =1025 width=621) (actual   time  =0.457..1.312   rows  =1031 loops=1) 

									    Recheck Cond: ((author)::text =   '曹志耘'  ::text) 

									    Heap Blocks: exact=438 

									    -> Bitmap   Index   Scan   on   ix_gallery_map_author (cost=0.00..36.10   rows  =1025 width=0) (actual   time  =0.358..0.358   rows  =1031 loops=1) 

									      Index   Cond: ((author)::text =   '曹志耘'  ::text) 

									    Planning   time  : 0.416 ms 

									    Execution   time  : 1.422 ms 

									 (7   rows  ) 

									 Time  : 2.462 ms 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '曹志耘'  ; 

									                   QUERY PLAN                 

									 ------------------------------------------------------------------------------------------------------------------------------------- 

									    Bitmap Heap Scan   on   gallery_map (cost=36.36..2715.37   rows  =1025 width=621) (actual   time  =0.752..2.119   rows  =1031 loops=1) 

									    Filter: ((author)::text ~~   '曹志耘'  ::text) 

									    Heap Blocks: exact=438 

									    -> Bitmap   Index   Scan   on   ix_gallery_map_author (cost=0.00..36.10   rows  =1025 width=0) (actual   time  =0.560..0.560   rows  =1031 loops=1) 

									      Index   Cond: ((author)::text =   '曹志耘'  ::text) 

									    Planning   time  : 0.270 ms 

									    Execution   time  : 2.295 ms 

									 (7   rows  ) 

									 Time  : 3.444 ms 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '曹志耘%'  ; 

									                QUERY PLAN              

									 ----------------------------------------------------------------------------------------------------------------- 

									    Seq Scan   on   gallery_map (cost=0.00..7002.32   rows  =1028 width=621) (actual   time  =0.015..41.389   rows  =1031 loops=1) 

									    Filter: ((author)::text ~~   '曹志耘%'  ::text) 

									    Rows   Removed   by   Filter: 71315 

									    Planning   time  : 0.260 ms 

									    Execution   time  : 41.518 ms 

									 (5   rows  ) 

									 Time  : 42.430 ms 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '%研究室'  ; 

									                QUERY PLAN              

									 ----------------------------------------------------------------------------------------------------------------- 

									    Seq Scan   on   gallery_map (cost=0.00..7002.32   rows  =2282 width=621) (actual   time  =0.064..52.824   rows  =2152 loops=1) 

									    Filter: ((author)::text ~~   '%研究室'  ::text) 

									    Rows   Removed   by   Filter: 70194 

									    Planning   time  : 0.254 ms 

									    Execution   time  : 53.064 ms 

									 (5   rows  ) 

									 Time  : 53.954 ms

可以看到，等于、like的全匹配是用到索引的，like的模糊查询还是全表扫描

三、创建gin索引

								
									 CREATE   EXTENSION pg_trgm; 

									 CREATE   INDEX   ix_gallery_map_author   ON   gallery_map USING gin (author gin_trgm_ops); 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '曹%'  ;  

									                   QUERY PLAN                 

									 ------------------------------------------------------------------------------------------------------------------------------------- 

									    Bitmap Heap Scan   on   gallery_map (cost=19.96..2705.69   rows  =1028 width=621) (actual   time  =0.419..1.771   rows  =1031 loops=1) 

									    Recheck Cond: ((author)::text ~~   '曹%'  ::text) 

									    Heap Blocks: exact=438 

									    -> Bitmap   Index   Scan   on   ix_gallery_map_author (cost=0.00..19.71   rows  =1028 width=0) (actual   time  =0.312..0.312   rows  =1031 loops=1) 

									      Index   Cond: ((author)::text ~~   '曹%'  ::text) 

									    Planning   time  : 0.358 ms 

									    Execution   time  : 1.916 ms 

									 (7   rows  ) 

									 Time  : 2.843 ms 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '%耘%'  ;  

									                QUERY PLAN              

									 ----------------------------------------------------------------------------------------------------------------- 

									    Seq Scan   on   gallery_map (cost=0.00..7002.32   rows  =1028 width=621) (actual   time  =0.015..51.641   rows  =1031 loops=1) 

									    Filter: ((author)::text ~~   '%耘%'  ::text) 

									    Rows   Removed   by   Filter: 71315 

									    Planning   time  : 0.268 ms 

									    Execution   time  : 51.957 ms 

									 (5   rows  ) 

									 Time  : 52.899 ms 

									 EXPLAIN ANALYZE   select   *   from   gallery_map   where   author   like   '%研究室%'  ; 

									                   QUERY PLAN                 

									 ------------------------------------------------------------------------------------------------------------------------------------- 

									    Bitmap Heap Scan   on   gallery_map (cost=31.83..4788.42   rows  =2559 width=621) (actual   time  =0.914..4.195   rows  =2402 loops=1) 

									    Recheck Cond: ((author)::text ~~   '%研究室%'  ::text) 

									    Heap Blocks: exact=868 

									    -> Bitmap   Index   Scan   on   ix_gallery_map_author (cost=0.00..31.19   rows  =2559 width=0) (actual   time  =0.694..0.694   rows  =2402 loops=1) 

									      Index   Cond: ((author)::text ~~   '%研究室%'  ::text) 

									    Planning   time  : 0.306 ms 

									    Execution   time  : 4.403 ms 

									 (7   rows  ) 

									 Time  : 5.227 ms

gin_trgm索引的效果好多了

由于pg_trgm的索引是把字符串切成多个3元组，然后使用这些3元组做匹配，所以gin_trgm索引对于少于3个字符（包括汉字）的查询，只有前缀匹配会走索引

另外，还测试了btree_gin，效果和btree一样

注意：

gin_trgm要求数据库必须使用UTF-8编码

demo_v1 # \l demo_v1

List of databases

---------+-----------+----------+-------------+-------------+-------------------

以上为个人经验，希望能给大家一个参考，也希望大家多多支持服务器之家。如有错误或未考虑完全的地方，望不吝赐教。

原文链接：https://blog.csdn.net/qq_23986087/article/details/104021214

查看更多关于postgresql 中的 like 查询优化方案的详细内容...

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://www.haodehen.cn/did229474

更新时间：2023-05-14 阅读：55次