About the matching and intersection issues of PostgreSQL JSONB

PostgreSQL has been supporting JSONB for more than ten years. Over the past decade, the community has provided many powerful features for JSONB. Personally, the most commonly used match operations are actually@> 。

If you regard JSON data as an abstract syntax tree (AST), this operator determines whether the right parameter is a subgraph of the left parameter.

There should have been a diagram here, but there is a temporary data set to process it on weekends, so there is no time to find the right tool. To give a few simple examples, the following example gets true, which should be easy to understand:

select '{"a": 1, "b": 2, "c": 3}'::jsonb @> '{"b":2}' ;
--------------
t

And it can also match more complex cases, and the following example is true:

select '{"a": 1, "b": 2, "c": {"value": 3}}'::jsonb @> '{"c":{"value": 3}}';
 ?column?
----------
 t
(1 row)

The following example may be a little confused by new users, but it actually fits this rule well:

select '{"a": 1, "b": 2, "c": {"value": 3}}'::jsonb @> '{"c":{}}';
 ?column?
----------
 t
(1 row)

But it should be noted that the following example result is false:

select '{"a": 1, "b": 2, "c": {"value": 3}}'::jsonb @> '{"c":[]}';
 ?column?
----------
 f
(1 row)

This is not difficult to understand.{}and[]Not equal.

The following example is more interesting:

select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [2]}}';
 ?column?
----------
 t
(1 row)

It should be noted here that when comparing whether a JSON array matches another, it does not require the order of the two arrays to be equal, as long as the right side is the real subset of the left side:

select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [2]}}';
 ?column?
----------
 t
(1 row)
 
select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [5, 2]}}';
 ?column?
----------
 f
(1 row)
 
select '{"a": 1, "b": 2, "c": {"value": [1, 2, 3]}}'::jsonb @> '{"c":{"value": [3, 2]}}';
 ?column?
----------
 t
(1 row)

This rule fits PostgreSQL's inverted index, PostgreSQL's gin index, JSONB field type and matching operation @> have become a very powerful combination. In the past few years, I have been accustomed to adding a meta field of type JSONB to some important business tables and indexing them

create index idx_xxx_meta on xxx using(gin);

It should be noted that the create index syntax when specifying the index type.

Such a design can solve many traditionally difficult problems. For example, I can put a tag list for each entry, and taking an entry with a few tags is a simple matching query:

select xxx from data_table where meta @> '{"tags": ["tag1", "tagx", "tagy"]}'

Because of the help of gin indexing, the performance of this search is sufficient for conventional Internet applications.

Even my colleagues in the CSDN NLP group have dug out new usages. In a table that stores the tree node, we save a meta field, which has a path list, storing the path of the current field in the tree. Each of its items is{"id": node_id, "title": something}With such a structure, when we search for all child nodes under a certain node, including its intergenerational child nodes, we only need to execute such a query:

select xxx from tree_node where meta @> '{"path": [{"id": node_id}]}'

Of course, this matching operation also has its limitations. It will be matched successfully only if the real sub-graph on the right is the left. For example, if I want to find a situation where the tags list contains any of my search terms (i.e., there is a non-null intersection between the two), this method will not work. At this time we need another operator?|

select '["tag1", "tag2", "tag3"]'::jsonb ?| '{tag2, tag3}';
 ?column?
----------
 t
(1 row)
 
select '["tag1", "tag2", "tag3"]'::jsonb ?| '{tag2, tag3, tag5}';
 ?column?
----------
 t
(1 row)
 
select '["tag1", "tag2", "tag3"]'::jsonb ?| '{tag5}';
 ?column?
----------
 f
(1 row)

Pay attention to these examples. First of all, the operator on the right is no longer jsonb, but must betext[], and secondly, it is actually checking the key value - that is, the value that can be stored through the gin index:

select '{"tag1":1, "tag2":2, "tag3":3}'::jsonb ?| '{tag5}';
 ?column?
----------
 f
(1 row)
 
select '{"tag1":1, "tag2":2, "tag3":3}'::jsonb ?| '{tag3}';
 ?column?
----------
 t
(1 row)
 
select '{"tag1":1, "tag2":2, "tag3":3}'::jsonb ?| '{tag3, tag1}';
 ?column?
----------
 t
(1 row)

PostgreSQL supports JSON and JSONB for more than ten years, and each version is actively enhancing its JSON data processing capabilities. Even though I have actively explored and learned in the past decade, I have no comprehensive understanding. This intersection operation was also noticed recently during the work of the NLP group.

This is the end of this article about the matching and intersection of PostgreSQL JSONB. For more related PostgreSQL JSONB content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!