Share PostgreSQL application tips and examples

There is no need for title party here. This article will summarize some not very common but more commonly used and useful SQL statements in Postgres, starting from application requirements and scenarios, and conduct simple explanations and analysis. The main purpose of these techniques and operations is to simplify operations, process data more efficiently, or improve development efficiency.

This article is not a systematic material, it can be regarded as a collection of contents that you feel is necessary to share. Since this part of the content and organization may be relatively scattered, this article will be updated and improved for a long time.

Why Postgres

Mainly, some people are always confused about this issue. The author’s understanding is simply like and appropriate (only for the author and the application scenario). Of course, there is no flawless or useless technology in the world. If you cannot understand this, you may not have to read down. But in addition to its own functions and characteristics, any technology or product will reflect the ideas and values upheld by some builders. Understanding and identifying these will build a deeper fit with users.

Let’s start with history. PG originated from Berkeley's Postgres software package (Relational Database Model Design and Implementation, 1986) project. It has a relatively independent and long-term technological development process. It has now developed into the world's most advanced and most complete open source relational database system (you know after you have a deep understanding of it, you will know that this is not a brag), and has been widely used in the open source community and information technology industry. As a long-term and stable open source software system, PG has many derivative systems, such as GreenPlum, and many technology companies such as Alibaba and Huawei have developed related derivative products based on or refer to PG.

PG was positioned as an enterprise-level database system since its birth, focusing on rigorous design, robust system, ease of use and rich functions. PG supports SQL standards most complete among all database systems. The PG technology community is also very active, has very good openness, and has relatively fast technology development, so it can provide rich and advanced functional features as soon as possible, with rich documents and materials, which can help programmers quickly develop and improve their applications.

In the entire technology stack, PG is also a good member of a complete and open ecological environment. PG can run on all mainstream operating systems and can be easily ported to other architectures and operating systems through open source. PG's comprehensive ecological environment provides support for all mainstream languages and development environments. In the nodejs system that I currently use, it can be ported and integrated very conveniently. Whether it is database connection, parameterized statements and execution, query result processing, exception processing, etc., it is very convenient and intuitive.

Modify the operation to return the record (returning)

PG provides the function of returning data when modifying records, which is very useful in scenarios where operation results need to be judged, especially when SQL execution operations and calling languages operate together. This operation statement is also very simple, use return.

insert into students(id,name) values ('id1','name1') returing id,name;
update students set status = 1 where id = 'id1' returning id,name, status;
delete from students where id = 'id1' returning id,name ;

Copy the table structure (createLike)

PG provides new words in the create table to help copy a table, and also includes some extension options. It is often used in data management.

create table new (
    like old
    including defaults
    including constraints
    including indexes
);

Insert or update (upsert)

In business application development, we often encounter the data processing requirement of "insert if not, update if there is." The traditional way is to query first. If there is a record, execute the update statement, otherwise execute the insert statement. In the case of clear rules, such operations are obviously inefficient because multiple client processing logic and server interactions are required. At this time, the insert onconflict mode can be used for processing, thereby simplifying code execution and supporting batch processing of data.

The sample code and key points are as follows:

insert into students (id,name,status) values ( 1, 's1', 10)
on conflict(id) do update set (name,status) = (, ) 
returning id,name,status;

Essentially, it is an insertion operation, but encounters some logical conflict and then performs subsequent processing logic
A conflicting criterion needs to be determined, usually with primary keys, or unique indexes (or composite indexes)
You can choose to perform update operations or choose not to do processing such as do nothing
When updating records, you can choose to use excluded logical records to perform data operations.
The existence of records can be judged through return

In PG15, a Merge into function similar to Oracle (also called Merge) can also be used to achieve this kind of business requirement.

query insert

--  Copy records from a table
insert into students_bak
select *, 100 as flag from students;

Inserting a query result set into a table is a very common operation. It should be noted that the number and type of the inserted and selected fields must match, and type conversion may sometimes be required.

Join update/delete

Joint update means that updating data is not simply based on a condition, but may be based on a query result, and the association relationship between it and this result needs to be referenced. This is very common in actual business scenarios. If this function is not supported well, multiple queries, processing and operations may be required on the client.

The author feels that although there is no logical problem, there are quite a big difference in the implementation and operation of Oracle and PG. PG's support for this operating mode is obviously better and more stable.

--  Copy records from a table
insert into students_bak
select *, 100 as flag from students;
-- Update and delete records based on queries and associations
with I as (select idhash id2 from students where region = 1)
update students set status = 1 from I where id2 = idhash

CTE and virtual tables (with as)

Using ready-made data and information, constructing a standard SQL record set, and then performing subsequent operations, such as joint query, update operations, etc., is a relatively common operation method. In Postgres, CTE (Common Table Expression common table expression, usually also refers to the With word) is often used for processing.

Let's take a look at a simple example

with 
A(id,name) as (values ('id1','name1'),('id2','name2'),('id3','name3')),
B as (select row_number() over () rn, A.* from A)
select * from B;

This statement can use a set of data to create a virtual table. You can see how to define the structure of a virtual table, and you can use multiple virtual tables or record sets.

In fact, the use of CTE is very flexible and powerful. It can connect multiple CTEs, use links, combine records and modify operations, etc. Reasonable use can help developers organize data processing processes, filter and prepare data in advance, and operate different data collections at the same time, etc. Of course, if it is a relatively large data set or a relatively complex operation, you need to pay attention to the efficiency of its execution when writing SQL.

Subquery

In relatively simple cases (such as only nesting one layer), subqueries can be used directly.

select iorder, split_part(l, ',', n) city from 
(select 'lz,wh,cd' l , generate_series(1, 3) n) V;
select iorder, vl[vi] city from 
( SELECT generate_series(1,3) iorder, string_to_array('lz,wh,cd', ',') vl) V ;

In both examples above, row_number is not used, but generate_series and subqueries are used to construct a sorted record set from an array in string form.

Use field numbers instead of field names

In party functions and sorting methods, field numbers can sometimes be used to replace field nouns, which can be used to simplify SQL statements.

select region, count(1) rcount from students group by 1 order by 2 desc;

This statement can use the first field to perform the aggregation count calculation, and the record set is inverted using the count results.

bit operations and calculations

PG provides standard bit operators that can be used directly in SQL without using extended functions. Using this feature in some scenarios can greatly simplify code and execution. Generally, in actual use, this type of operation is used around labeled state management, and it is nothing more than the following operations.

Set certain states or combinations
To the original state, add a state (regardless of whether this state is originally available or not)
On the original state, remove a state (regardless of whether this state was originally available or not)
Find records with certain states or combinations of states (using status tags as query conditions)
Find records that do not have status or status combinations
Find records that have certain states but not others

The following SQL statements can implement these requirements:

-- Update Status Field，Add a state2
update students set status = status | 2 where id = 1;  
-- Update Status Field，Add two states(2,4)At the same time，Remove a state1
update students set status = status | (2+4) - (status &amp; 1) where id = 1; 
-- There is a status in the search2Records of
select id from students where status &amp; 2 = 2; 
-- In the search status, only2Records of(If there is8Type of state)
select id from students where status &amp; (2**8-1) = 2; 
-- No in the search status2Records of
select id from students where status &amp; 2 = 0; 
-- There is a status in the search2and1Records of
select id from students where status &amp; 3 = 3; 
-- There is a status in the search2，or1Records of
select id from students where status &amp; 3 &gt; 0; 
-- There is no search status2，None, either1Records of
select id from students where status &amp; 3 = 0;

Note:

1 The single state here is usually a simple 2^n integer, and the compound state is its arithmetic sum, which can be represented. There are 32 states when using ordinary positive integers.

2 To remove a state, you cannot use - or xor directly. In fact, it is a composite operation using - and &, that is, first use & to determine whether this state exists, and then subtract this result.

Aggregation filter

PG provides filter methods, which can use certain conditions during aggregation calculations, so that a single SQL statement and a record can be used to implement query of multiple conditional gatherings. If you do not operate this way, you may need to use Unionall and other statements to merge the results of multiple conditional queries. According to PG documents, this operation may only be performed once on the record set when executed, which is relatively efficient.

The following examples can help us understand this:

select region,
count(1) iall, 
count(1) filter (where status & 1 = 1) i1, 
count(1) filter (where status & 2 = 2) i2,
count(1) filter (where status & 3 = 3) i3,
count(1) filter (where status & 3 > 0) i31,
count(1) filter (where status & 3 = 0) i32
from students group by 1 ;

The function design and use of this filter filter is very clever, but I personally think where in the filter condition statement can be completely omitted.

Repeat record processing

There are two types of duplicate records here, one is logical duplicate records. If the ID number is the same due to incorrect input, it needs to be searched and processed during data maintenance:

-- CTE count
with I as (
select name n2,icount from (select name, count(1) icount from students group by 1 )C 
where icount > 1)
select name, idnumber from students join I on name = n2 order by 1;
-- having filter
select name, count(1) from students group by 1 having count(1) > 1;
-- row_number
select * from (
select idnumber,name, row_number() over(partition by idnumber order by name ) rn from students) S where rn > 1;

The above statements can query these duplicate records, and there are two methods: aggregate query counting and window function filtering. The former has better performance, but if there are sorting conditions to determine duplicate information or need to obtain details directly, the latter can also be used.

Another type is complete repetition, such as accidentally inserting two identical records in the database operation. If you want to clear one of them, using regular logic operations is invalid because all conditions are the same. At this time, you can use the ctid in postgres to handle it.

delete from students where ctid not in (
select ctid from students where idnumber = 'xxx' limit 1
) and idnumber = 'xxx' ;

The above statement should keep only one repetitive student.

ctid is the "system field" of PG. The author thinks it can be understood as the location of this record on disk and can be used as an absolute unique identifier for a record in the system. Of course, this information may change under different system states, so it should be used as a special treatment plan and be used with caution.

RandomQuery

Using order by random(), you can randomly return the query results during query:

select idnumber, name from students order by random() limit 10;

Cryptography extensions and simple applications (pgcrypto)

PG provides cryptography-related functions and operation methods through extensions. Here is a brief list of commonly used functions and methods.

create extension pgcrypto;
-- sha256summary
SELECT encode(digest('China', 'sha256'), 'hex');
-- hmac256
SELECT encode(hmac('China', 'key','sha256'), 'hex');
-- Store password
UPDATE ... SET pswhash = crypt('new password', gen_salt('md5'));
-- Check password The return value is a boolean value
SELECT (pswhash = crypt('entered password', pswhash)) AS pswmatch FROM ... ; -- false
SELECT (pswhash = crypt('new password', pswhash)) AS pswmatch FROM ... ; --true
-- Generate a salt
select gen_salt('md5');
-- encrypt and decrypt
with 
V(passwd,otext) as (values ('t0pSecret'::bytea,'china')),
E as (select encrypt(otext::bytea, passwd,'aes') econtent from V),
D as (select encode(econtent,'base64') b64, 
convert_from(decrypt(econtent, passwd,'aes'),'utf8') dtext from V,E)
select ,, D.* from V,D;

To use this extension, you must first install it. After the installation is completed, pgcrypto provides a series of custom functions to perform cryptography operations. Like ordinary languages, the standard format of pgcrypot operation data is bytea (byte array), so when using it, you need to pay attention to encoding and format conversion.

encode: Encode, convert bytea to a specified format such as hex
digest: summary function, supports sha1 and sha256
hmac: summary message verification code
crypto: A password storage and verification method implemented by pg, without decryption process, only matching verification
gen_salt: random salt generation, but it seems to be just the format set by pgcryto
encrypt/decrypt: Standard AES encryption and decryption, but it seems that there are not many optional settings, so it is simple to use
PGP cryptography-related functions, this part is more complex, has a lot of content, and is not widely used. I will not discuss it here

Array operation (Array)

PG can easily perform array-related operations. When using it, it should be noted that the array index of PG starts from 1, and in addition, you need to specify the type when defining the array. Reasonable use of arrays can simplify program development and database maintenance. For example, you can use an array to record relevant operation timelines, such as creation, modification, completion time, etc. You can also record data of array types such as labels.

-- Add an array field
alter table students add column otimes integer[]; 
-- Insert data
insert into students (id,otimes[1]) values (1, 200);
-- Query data
select id,name from students where otimes[1] = 100;
-- useanyandallfunction
select id,name from students where any(otimes) &gt; 100;
select id,name from students where all(otimes) &gt; 100;
-- Extend array ||Operator, array_catmethod
select array[1, 2, 3] || 4 as element_append;
select array_cat('{1, 2}', ARRAY[3, 4]) as concatenated_arrays;
-- Delete elements Array，Location
select array_remove(ARRAY[1,2,3,2,5], 2) as removed_2s;
-- Replace elements Array，Find value，New value
select array_replace(ARRAY[1,2,3,2,5], 2, 10) as two_becomes_ten;
-- 填充Array Fill value，Fill count，开始Location
select array_fill(90,array[5],array[3])

JSON Operation (JSON)

PG has built-in JSON support, which provides great convenience for web application development. This part has a lot of content, so if you have the opportunity to write an article to explain it.

Strings, arrays and records interchange (Agg)

Methods that can be used include string_to_array, unnest, string_agg, array_agg, etc. It can also be used in aggregate scenarios.

-- string 2 array to row
select  unnest(string_to_array('a,b,c',','));
-- array agg, string agg
with A as (select unnest(string_to_array('a,b,c',',')) c)
select array_agg(), string_agg(,';') from A;

string_to_array: Use delimiter to convert strings into arrays
unnest: Convert array to record
string_agg: Use delimiter to link records and convert them to string
array_agg: Convert records to array

Record set with line number (Row Number)

Sometimes it is necessary to force two record sets to merge horizontally, and additional associated fields may be required, such as using a row number to achieve this. The following statement can add a row number field to a record set, thereby constructing a new record set.

with 
S as (select name from students limit 100),
N as (select row_number() over() rn,name from S)
select * from N;

Custom Order

Generally speaking, in relational database systems, it is very simple to sort record sets, and only the column values need to be calculated as the ordering basis. But these sorting follows a fixed sorting rule.

Sometimes you will encounter the order that you need to define your own sorting, and this rule has a bit conflict with the default text or array sorting rules, so you need to use some special processing methods, and there are several ideas below.

UnionALL forcibly output record sets in order
Add a sorting field, add a sorting field, and set and maintain the sorting values. This method can handle any sorting requirements
The previous idea is to use calculation columns
case when sets the sorting order, use case when to return the sorting number to express the sorting rules, and then sort them with this
Virtual table association, write the rules used for sorting to a virtual record set with sorted values. When querying, the recordset is associated with the sorting number and sorted using the sorting number inside.

Computed Column

Versions above PG12 support computed columns, also known as generated columns, and are defined as follows:

CREATE TABLE Students (
  Id INTEGER PRIMARY KEY,
  FirstName VARCHAR(50),
  LastName VARCHAR(50),
  FullName VARCHAR(101) GENERATED ALWAYS AS (FirstName || ' ' || LastName) STORED
);

Like views, the rational planning and use of computed columns may optimize business logic, simplify programming, and reduce data operations and maintenance.

The syntax of PG's calculated column definition is fixed, that is, only ALWAYS and STORED, entity fields are supported. Theoretically, the use of computed columns is exactly the same as that of ordinary columns. There are still some limitations in the use of PG's calculated columns, such as the inability to nest other calculated columns, the subquery that references the current row cannot be used, it cannot be used as a partition field, it cannot be used as a primary key, etc.

Time Int (TimeInt)

PG provides a series of time functions and reserved words for processing time.

-- Timestamp seconds
select extract(epoch from current_timestamp)::integer; 
-- Time Type
select current_timestamp,now(),current_time, current_date;
-- Convert timestamps to formatted strings
SELECT TO_CHAR(to_timestamp((28070418 + 480) * 60), 'YYYY-MM-DD HH:MI');

In actual project use, when formatting time, you may need to pay attention to the system's time zone, because the default is UTC.

Table Inherits

PG supports inheritance relationships between tables, similar to inheritance of object-oriented programming. If it is planned and used reasonably, programming and data management can be simplified. The following examples can help us understand the implementation and use of table inheritance:

-- City Information Table
CREATE TABLE cities (
    name            text,
    population      float,
    altitude        int     -- foot
);
-- Provincial capital city inherits from cities，More attributes of the province
CREATE TABLE capitals (
    state char(2)
) INHERITS (cities); 
insert into cities (name,population, altitude) 
values ('Suzhou', 100,50),('Mianyang', 100,50),('Shenzhen', 100,50),('Luoyang', 100,50);
insert into capitals (name,population, altitude, state) 
values ('Guangzhou', 2100,50,'gd'),('Chengdu', 1600,500,'sc');
select * from cities;
select * from capitals;
-- All cities，Identify whether the capital is in existence
select C.*, nullif(,null) capOfState 
from cities C left join capitals P on  = ;

By default, queries include inherited tables. If you want to specify a query range, you can use the only keyword. In addition, there are some limitations on data modification and index use of inherited tables. If you do want to use the business structure of the inheritance table, you need to confirm understanding of these limitations and carefully evaluate the conflicts with business requirements.

Having

The HAVING sentence can be used when filtering grouped results. It is usually used with the GROUP BY clause to filter grouped results that meet certain criteria. The author understands that Having is the word "where" in the group by. It may be considered that there is a logical conflict with the where of ordinary queries, so separate keywords and processing methods are used.

SELECT country, COUNT(*) AS total_customers
FROM customers GROUP BY country
HAVING COUNT(*) > 5;

Exists

exists is to use a subquery to use its query result as one of the query conditions for the main query, as follows:

SELECT * FROM orders
WHERE EXISTS (
  SELECT 1 FROM customers
  WHERE orders.customer_id = customers.customer_id
);

By using exists reasonably, you can set preselected conditions to reduce the query alternative result set and improve query efficiency. Or establish an association between queries.

It should be noted that the exists subquery and the main query can have no logical relationship. For each result of the main query result, the subquery will be checked. Therefore, if it is an associated query, you need to be particularly careful of its usage conditions, otherwise there may be serious performance problems.

COALESCE, NULLIF

PG provides the function colaesce, which accepts multiple parameters and returns the first value that is not empty. It is often used in the application scenario where "use default values if the field content is empty".

SELECT COALESCE(null, 5, null, 10); -- return 5

The word "COALESCE" consists of the root co- (meaning or together), and the Latin word "alescere" (meaning length and growth). Therefore, its original meaning can be understood as combining multiple elements or options into one, or combining multiple parts into one whole. In the field of computer science, "COALESCE" is a common term used to describe operations that combine multiple values into one value. The COALESCE function in PostgreSQL is also used in this sense, and is similar to the concept of "merging" multiple values into one value, which can merge multiple parameters and return the first non-NULL value.

There is also a more commonly used function nullif in PG. It can compare the parameters of two inputs. If they are equal, they will return null; otherwise, they will return the first value.

SELECT NULLIF(10, 10); -- return NULL
SELECT NULLIF('abc', 'def'); -- return 'abc'

Common scenarios for nullif include checking 0, checking default values, error handling, data replacement, etc.

case when

In fact, the case When provided by PG is quite powerful. i.e. it can work in simple ways and search ways. Developers can flexibly choose according to their needs.

-- Simple way
CASE
  WHEN condition1 THEN result1
  WHEN condition2 THEN result2
  ...
  ELSE result
END
-- Search Method
CASE expression
  WHEN value1 THEN result1
  WHEN value2 THEN result2
  ...
  ELSE result
END

Default query result (Default Row)

The following statement can return a default record when the result set is empty. There may be no need for special processing logic on the client side.

SELECT column1, column2, ... FROM table WHERE condition
UNION ALL
SELECT default_value1, default_value2, ...
LIMIT 1

But the problem here is that no matter whether this query has a result or not, a default record will be output? Maybe you can add exists to the subsequent result set to check, which is obviously more troublesome. If you really have this requirement, you can consider it.

Determine whether the query result is empty (Query Empty)

If you only care about whether the records that meet the query conditions exist, the easiest and most efficient way should be:

SELECT true from students where ...
union all 
select false limit 1;

Window Function

PG provides a series of window functions, which greatly facilitates the development of various statistical queries. This part of the content is very much, and the author will write a separate article to discuss it. Here I will briefly explain the key points.

First of all, we need to understand the difference between window functions and aggregate functions. Although they are commonly used statistical functions, aggregate functions can only handle simple data classification and related calculations. For example, the ranking recorded in the partition is not handled by the aggregate function because it does not care about the relationship between individuals and groups.

The window function can not only partition the data (the origin of the name of the window function), but also process statistically based on the partition. More importantly, it can more carefully analyze the relationship between individuals and groups, and directly attach statistical attributes to the original individual records (the aggregate function will lose individual information), so as not to lose detailed information, providing a more powerful possibility of data analysis and presentation. Of course, we can also imagine that such operations will inevitably cause some performance problems, because there are more data and details to be processed, so it needs to be considered and balanced in actual use.

Window functions and statistical related functions are also different. Those functions mainly process a set of data that can be statistically calculated. Window functions are used to process a set of records. Of course, they can be used in combination to achieve the purpose of grouping statistical analysis.

Window functions can be written directly in ordinary query statements (without using group by sentences). The standard form of this function series is:

f() over (partition by ... order by ... )

Partitioning and sorting can use multiple fields, or return other fields of the record at the same time.

Commonly used window functions include:

row_number: There will be no duplicate values in the partition according to the sorting rules.
rank, density_rank, percent_rank: Ranking in the partition according to the sorting rules. Note that this ranking can be tied, with interrupts or non-interruptions in the middle, which is often used as a ranking of results; density_rank is a close ranking (the ranking is tied and not skipped); percent_rank is a percentage ranking.
firstvalue,lastvalue: the first and last records in the partition, specifying the value of the field
cume_dist: The cumulative distribution value of the current row in the partition
lag, lead: The value of the specified offset before or after the current record in the current partition
ntile: Returns the number of the shard to which the current line belongs after the partition is evenly sharded;
nth_value: Get the value of the specified field and the specified position in the partition

Delay simulation (Sleep)

I don't know where it will be used, maybe it is the timing or delayed execution in the stored procedure:

pg_sleep(3); // Sleep for 3 seconds

Delayed reference (Lateral)

The author is also quite vague about the understanding of this sentence and has not found a particularly useful application. Let's take a look at an example first

select pledged_usd, avg_pledge_usd, amt_from_goal, duration, (usd_from_goal / duration) 
as usd_needed_daily from kickstarter_data, 
lateral (select pledged / fx_rate as pledged_usd) pu 
lateral (select pledged_usd / backers_count as avg_pledge_usd) apu 
lateral (select goal / fx_rate as goal_usd) gu 
lateral (select goal_usd - pledged_usd as usd_from_goal) ufg 
lateral (select (deadline - launched_at)/86400.00 as duration) dr;

Someone explained this: "lateral keyword allows access to fields defined after from, and references fields defined before this". The reason for this design is probably because the order of SQL statement query execution is from and join. Generally, the subsequent fields cannot be processed, so a keyword is required to be declared here. In some scenarios, the flexibility of writing SQL statements is improved.

The above is the detailed content of PostgreSQL application skills and examples. For more information about PostgreSQL applications, please follow my other related articles!