When the quantity was relatively large, the efficiency of fuzzy query was very slow. In order to optimize the query efficiency, try the following methods to compare the efficiency.
1. Comparative situation description:
1. Data volume 100w pieces of data
2. Execute sql
2. Comparison results
explain analyze SELECT c_patent, c_applyissno, d_applyissdate, d_applydate, c_patenttype_dimn, c_newlawstatus, c_abstract FROM public.t_knowl_patent_zlxx_temp WHERE c_applicant LIKE '%Anping An Breeding Farm in Lianshanguan Town, Benxi Manchu Autonomous County%';
1. Implementation plan when no search is built:
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=217.264..217.264 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=212.355..212.355 rows=0 loops=3) Filter: ((c_applicant)::text ~~ '%Anping An Breeding Farm in Lianshanguan Town, Benxi Manchu Autonomous County%'::text) Rows Removed by Filter: 333333 Planning time: 0.272 ms Execution time: 228.116 ms"
2. btree index
Indexing statement
CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx(c_applicant varchar_pattern_ops);
Execution plan
"Gather (cost=1000.00..83803.53 rows=92 width=1278) (actual time=208.253..208.253 rows=0 loops=1) Workers Planned: 2 Workers Launched: 2 -> Parallel Seq Scan on t_knowl_patent_zlxx (cost=0.00..82794.33 rows=38 width=1278) (actual time=203.573..203.573 rows=0 loops=3) Filter: ((c_applicant)::text ~~ '%Anping An Breeding Farm in Lianshanguan Town, Benxi Manchu Autonomous County%'::text) Rows Removed by Filter: 333333 Planning time: 0.116 ms Execution time: 218.189 ms"
But if you change the query SQL slightly, remove the pre-% in the like query.
Index Scan using idx_public_t_knowl_patent_zlxx_applicant on t_knowl_patent_zlxx_temp (cost=0.55..8.57 rows=92 width=1278) (actual time=0.292..0.292 rows=0 loops=1) Index Cond: (((c_applicant)::text ~>=~ 'An Ping'an Breeding Farm in Lianshanguan Town, Benxi Manchu Autonomous County'::text) AND ((c_applicant)::text ~<~ 'An Ping'an Breeding Cities in Lianshanguan Town, Benxi Manchu Autonomous County'::text)) Filter: ((c_applicant)::text ~~ 'Anping An Breeding Farm in Lianshanguan Town, Benxi Manchu Autonomous County'::text) Planning time: 0.710 ms Execution time: 0.378 ms
3. Gin index
Create index statements (postgresql requires version 9.6 and above)
create extension pg_trgm; CREATE INDEX idx_public_t_knowl_patent_zlxx_applicant ON public.t_knowl_patent_zlxx USING gin (c_applicant gin_trgm_ops);
Execution plan
Bitmap Heap Scan on t_knowl_patent_zlxx (cost=244.71..600.42 rows=91 width=1268) (actual time=0.649..0.649 rows=0 loops=1) Recheck Cond: ((c_applicant)::text ~~ '%Anping An Breeding Farm in Lianshanguan Town, Benxi Manchu Autonomous County%'::text) -> Bitmap Index Scan on idx_public_t_knowl_patent_zlxx_applicant (cost=0.00..244.69 rows=91 width=0) (actual time=0.647..0.647 rows=0 loops=1) Index Cond: ((c_applicant)::text ~~ '%Anping An Breeding Farm in Lianshanguan Town, Benxi Manchu Autonomous County%'::text) Planning time: 0.673 ms Execution time: 0.740 ms
3. Conclusion
btree index allows fuzzy matching of the post-% "abc%" to go through the index, and gin + gp_trgm allows % "%abc%" to go through the index. However, gin indexes also have disadvantages, and the following situations may cause hits to fail:
When the search field is less than 3 characters, the index will not be hit, which is caused by gin's own mechanism.
When the search field is too long, such as email search, the index may not be hit, and the reason is currently unknown.
Supplement: PostgreSQL LIKE query efficiency improvement experiment
1. Query efficiency without indexing
For comparison, test the unindexed query first
EXPLAIN ANALYZE select * from gallery_map where author = 'Cao Zhiyun'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.011..39.753 rows=1031 loops=1) Filter: ((author)::text = 'Cao Zhiyun'::text) Rows Removed by Filter: 71315 Planning time: 0.194 ms Execution time: 39.879 ms (5 rows) Time: 40.599 ms EXPLAIN ANALYZE select * from gallery_map where author like 'Cao Zhiyun'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1025 width=621) (actual time=0.017..41.513 rows=1031 loops=1) Filter: ((author)::text ~~ 'Cao Zhiyun'::text) Rows Removed by Filter: 71315 Planning time: 0.188 ms Execution time: 41.669 ms (5 rows) Time: 42.457 ms EXPLAIN ANALYZE select * from gallery_map where author like 'Cao Zhiyun%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.017..41.492 rows=1031 loops=1) Filter: ((author)::text ~~ 'Cao Zhiyun%'::text) Rows Removed by Filter: 71315 Planning time: 0.307 ms Execution time: 41.633 ms (5 rows) Time: 42.676 ms
Obviously, they will do full table scans
2. Create a btree index
PostgreSQL default index is btree
CREATE INDEX ix_gallery_map_author ON gallery_map (author); EXPLAIN ANALYZE select * from gallery_map where author = 'Cao Zhiyun'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.457..1.312 rows=1031 loops=1) Recheck Cond: ((author)::text = 'Cao Zhiyun'::text) Heap Blocks: exact=438 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.358..0.358 rows=1031 loops=1) Index Cond: ((author)::text = 'Cao Zhiyun'::text) Planning time: 0.416 ms Execution time: 1.422 ms (7 rows) Time: 2.462 ms EXPLAIN ANALYZE select * from gallery_map where author like 'Cao Zhiyun'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=36.36..2715.37 rows=1025 width=621) (actual time=0.752..2.119 rows=1031 loops=1) Filter: ((author)::text ~~ 'Cao Zhiyun'::text) Heap Blocks: exact=438 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..36.10 rows=1025 width=0) (actual time=0.560..0.560 rows=1031 loops=1) Index Cond: ((author)::text = 'Cao Zhiyun'::text) Planning time: 0.270 ms Execution time: 2.295 ms (7 rows) Time: 3.444 ms EXPLAIN ANALYZE select * from gallery_map where author like 'Cao Zhiyun%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..41.389 rows=1031 loops=1) Filter: ((author)::text ~~ 'Cao Zhiyun%'::text) Rows Removed by Filter: 71315 Planning time: 0.260 ms Execution time: 41.518 ms (5 rows) Time: 42.430 ms EXPLAIN ANALYZE select * from gallery_map where author like '% Research Room'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=2282 width=621) (actual time=0.064..52.824 rows=2152 loops=1) Filter: ((author)::text ~~ '% Research Room'::text) Rows Removed by Filter: 70194 Planning time: 0.254 ms Execution time: 53.064 ms (5 rows) Time: 53.954 ms
You can see that whether the full match of equals and like is used for indexing, like fuzzy query or full table scanning
3. Create a gin index
CREATE EXTENSION pg_trgm; CREATE INDEX ix_gallery_map_author ON gallery_map USING gin (author gin_trgm_ops); EXPLAIN ANALYZE select * from gallery_map where author like 'Cao%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=19.96..2705.69 rows=1028 width=621) (actual time=0.419..1.771 rows=1031 loops=1) Recheck Cond: ((author)::text ~~ 'Cao%'::text) Heap Blocks: exact=438 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..19.71 rows=1028 width=0) (actual time=0.312..0.312 rows=1031 loops=1) Index Cond: ((author)::text ~~ 'Cao%'::text) Planning time: 0.358 ms Execution time: 1.916 ms (7 rows) Time: 2.843 ms EXPLAIN ANALYZE select * from gallery_map where author like '%Win%'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Seq Scan on gallery_map (cost=0.00..7002.32 rows=1028 width=621) (actual time=0.015..51.641 rows=1031 loops=1) Filter: ((author)::text ~~ '%Win%'::text) Rows Removed by Filter: 71315 Planning time: 0.268 ms Execution time: 51.957 ms (5 rows) Time: 52.899 ms EXPLAIN ANALYZE select * from gallery_map where author like '%Learning Room%'; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- Bitmap Heap Scan on gallery_map (cost=31.83..4788.42 rows=2559 width=621) (actual time=0.914..4.195 rows=2402 loops=1) Recheck Cond: ((author)::text ~~ '%Learning Room%'::text) Heap Blocks: exact=868 -> Bitmap Index Scan on ix_gallery_map_author (cost=0.00..31.19 rows=2559 width=0) (actual time=0.694..0.694 rows=2402 loops=1) Index Cond: ((author)::text ~~ '%Learning Room%'::text) Planning time: 0.306 ms Execution time: 4.403 ms (7 rows) Time: 5.227 ms
The gin_trgm index has much better effect
Since the index of pg_trgm cuts the string into multiple 3 tuples and then uses these 3 tuples to match, the gin_trgm index only has the prefix matching to the index for queries with less than 3 characters (including Chinese characters).
In addition, btree_gin was also tested, the effect is the same as btree
Notice:
gin_trgm requires that the database must be encoded using UTF-8
demo_v1 # \l demo_v1 List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges ---------+-----------+----------+-------------+-------------+------------------- demo_v1 | wmpp_user | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
The above is personal experience. I hope you can give you a reference and I hope you can support me more. If there are any mistakes or no complete considerations, I would like to give you advice.