SoFunction
Updated on 2025-04-08

Implementation method of using PostgreSQL database for full-text search in Chinese

How to use zhparser

Install the corresponding plug-in

1 Install pg_trgm

Used to perform text similarity and fuzzy searches. It is based on the trigram algorithm, allowing you to perform fuzzy matching, similarity searches, and indexes on text data. The command is as follows

CREATE EXTENSION pg_trgm

2 Install zhparser

Specially used to process Chinese text data. It allows PostgreSQL to perform full-text search on Chinese text, including word segmentation, stemming extraction, stop word processing and other operations, in order to improve the search accuracy and performance of Chinese text data. The command is as follows

CREATE EXTENSION zhparser

Create a search configuration based on zhparser

CREATE TEXT SEARCH CONFIGURATION chinese_zh (PARSER = zhparser)

This command creates a name calledchinese_zhChinese text search configuration. This configuration is usedzhparserparser, indicating that it will be used to analyze and process Chinese text data.

ALTER TEXT SEARCH CONFIGURATION chinese_zh ADD MAPPING FOR n,v,a,i,e,l WITH simple

This command will specify the vocabulary category (n, v, a, i, e, l) with the existingsimpleConfiguration for mapping. This means that for these vocabulary categories, thesimpleConfiguration rules for text analysis and processing.

Conduct search query

select ts_debug('chinese_zh', 'The Spring and Autumn Period and the Warring States Period were the era of thought bursting in Chinese history, and a hundred schools of thought contend for the debate');

Execution results:

"(t,""time, morphemes"",Spring and Autumn,{},,)"
"(u,""auxiliary, auxiliary word"",、,{},,)"
"(t,""time, morphemes"",Warring States,{},,)"
"(v,""verb, verb"",yes,{simple},simple,{yes})"
"(n,""noun, noun"",China,{simple},simple,{China})"
"(n,""noun, noun"",history,{simple},simple,{history})"
"(n,""noun, noun"",Thoughts,{simple},simple,{Thoughts})"
"(v,""verb, verb"",think,{simple},simple,{think})"
"(v,""verb, verb"",Burst,{simple},simple,{Burst})"
"(u,""auxiliary, auxiliary word"",of,{},,)"
"(n,""noun, noun"",era,{simple},simple,{era})"
"(u,""auxiliary, auxiliary word"",,,{},,)"
"(v,""verb, verb"",Appear,{simple},simple,{Appear})"
"(v,""verb, verb"",It's,{simple},simple,{It's})"
"(n,""noun, noun"",Hundred Schools of Thought,{simple},simple,{Hundred Schools of Thought})"
"(u,""auxiliary, auxiliary word"",of,{},,)"
"(n,""noun, noun"",Phenomenon,{simple},simple,{Phenomenon})"
select to_tsvector('chinese_zh', 'The Spring and Autumn Period and the Warring States Period were the era of thought bursting in Chinese history, and a hundred schools of thought contend for the debate');

Execution results:

"'Shangs of Thoughts':4 'China':2 'Law':9 'appearance':8 'History':3 'Think':5 'Time':7 'Yes':1 'Phenomenon':11 'Hundred Schools of Thoughts':10 'Breaking out':6"

You can see that the Spring and Autumn Period and the Warring States Period did not appear in it, here we putt, time, time morphemeAdd the map to

ALTER TEXT SEARCH CONFIGURATION chinese_zh ADD MAPPING FOR t WITH simple;

Execute the above statement and get the following result

"'Shangs of Thoughts':6 'China':4 'Long':11 'appearance':10 'History':5 'Thinking':7 'Warring States':2 'Times':9 'Spring and Autumn':1 'Yes':3 'Phenomenon':13 'Hundred Schools of Thoughts':12 'Breaking Out':8"

It can be seen that the Spring and Autumn Period and the Warring States Period have appeared in the results of text vectors. Let's search for spring and autumn to see

select to_tsquery('Warring States') @@ to_tsvector('chinese_zh', 'The Spring and Autumn Period and the Warring States Period were the era of thought bursting in Chinese history, and a hundred schools of thought contend for the debate');

true

Add a custom dictionary (txt)

select to_tsquery('Twenty-Four Session Valley') @@  to_tsvector('chinese_zh', 'The Spring and Autumn Period and the Warring States Period were the era of thought bursting in Chinese history, and a hundred schools of thought occurred in the Twenty-Fourth Valley');
false

There is no such thing here. I carefully observe the vectorized analysis of this sentence above and solve the case. The word "Twenty-Four Section Valley" is not used as a whole.

'Super Thought':6 'China':4 'It':11 'Appear':10 'history':5 'think':7

Create a customer dictionary in the following location/usr/share/postgresql/14/tsearch_data/kkdict.: (must be in this directory)

#word TF  IDF ATTR
Twenty-four valley  1 1 n

Then modify/var/lib/postgresql/data/, add the following line at the end:zhparser.extra_dicts = 'kkdict.'

select ts_debug('chinese_zh', 'The Spring and Autumn Period and the Warring States Period were the era of thought bursting in Chinese history, and a hundred schools of thought occurred in the Twenty-Fourth Valley');
(n,"noun, noun",Twenty-four valley,{simple},simple,{Twenty-four valley})

It can be seen that it has now become a noun.

Create an index

Here we use the to_tsvector function method, and do not add a column of tsvector fields separately.

create table testing(
  title text
  );
  insert into testing values('The Spring and Autumn Period and the Warring States Period were the era of thought bursting in Chinese history, and a hundred schools of thought contend for the debate');
  insert into testing values('The Spring and Autumn Period and the Warring States Period were the era of thought bursting in Chinese history, and a hundred schools of thought occurred in the Twenty-Fourth Valley');
create index ind_testing on testing using gin (to_tsvector('chinese_zh', title));

In this way, you can use Chinese search in the query. Note that when you don’t have enough data, it will not be indexed, which will be direct and convenient.

select * from testing where to_tsquery('chinese_zh', 'Twenty-Four Session Valley') @@ to_tsvector('chinese_zh', title);

Summarize

The above is the detailed content of the implementation method of using Chinese full text search in PostgreSQL database. For more information about PostgreSQL's full text search in Chinese, please pay attention to my other related articles!