Preface:
Regarding database paradigm, I have heard about it from time to time, but I have not been able to understand it in detail. Generally database books or database courses will introduce paradigm related content, and paradigm also often appears in database exam questions. It is not clear whether you have a clearer understanding of paradigm? In this article, let's learn about database paradigm.
1. Introduction to the database paradigm
In order to create less redundant and well-structured databases, certain rules have to be followed while designing the database. In relational databases such rules are called paradigms. A paradigm is a summary that meets a certain design requirement. To design a well-structured relational database, certain paradigms must be satisfied.
The English name of the paradigm is Normal Form, or NF for short. It was summarized by the British in the 1970s after they proposed the relational database model. Paradigm is the basis of relational database theory, but also in the process of designing the database structure to follow the rules and guidelines.
There are currently six common paradigms for relational databases: the first paradigm (1NF), the second paradigm (2NF), the third paradigm (3NF), the Bass-Code paradigm (BCNF), the fourth paradigm (4NF), and the fifth paradigm (5NF, also known as the perfect paradigm). The paradigm that fulfills the minimum requirements is the first paradigm (1NF). The one that further satisfies more normative requirements on top of the first paradigm is called the second paradigm (2NF), and the rest of the paradigms follow in the same order.
2. Common Paradigms Explained
In the design of the database, will refer to the paradigm requirements to do, but does not mean that follow the paradigm level higher the better, the paradigm is too high, although with better constraints on the data relationship, but also lead to more cumbersome relationships between the tables, resulting in each operation of the table will become more, the database performance degradation. Usually, in relational database design, the highest is followed to BCNF, and generally still 3NF. That is, in general, we use the first three paradigms is already enough. Here we come to understand the first three commonly used paradigms in detail.
First Paradigm (1NF)
The first paradigm is the most basic paradigm. If all field values in a database table are indecomposable atomic values, it means that the database table satisfies the first paradigm. Simply put, the first paradigm is that each row of data is indivisible, there can be no more than one value in the same column, and if there is a duplicate attribute, you need to define a new entity.
Example: Suppose a company wants to store the names and contact information of its employees. It creates a table as follows:
Two employees (Jon & Lester) have two cell phone numbers, so the company stores them in the same table as shown above. Then the table is not 1NF compliant because the rule says that "each attribute of a table must have an atomic (single) value", and the emp_mobile value for Jon & Lester's employees violates this rule. In order for the table to be 1NF compliant, we should have the following table data:
Second Paradigm (2NF)
The second paradigm builds on the first paradigm. The second paradigm ensures that every column in a database table is related to the primary key, not just a part of the primary key (mainly for joint primary keys). This means that only one type of data can be stored in a database table, not multiple types of data in the same database table.
+----------+-------------+-------+ | employee | department | head | +----------+-------------+-------+ | Jones | Accountint | Jones | | Smith | Engineering | Smith | | Brown | Accounting | Jones | | Green | Engineering | Smith | +----------+-------------+-------+
The table above describes the relationship between the hired person, the work department and the leader. We call the data that uniquely represents a row of a table in the database the primary key of the table. The head column in the table is not related to the primary key. Therefore, the table is not second paradigm compliant and in order to make the above table second paradigm compliant, it needs to be split into two tables:
-- employee primary key +----------+-------------+ | employee | department | +----------+-------------+ | Brown | Accounting | | Green | Engineering | | Jones | Accounting | | Smith | Engineering | +----------+-------------+ -- department primary key +-------------+-------+ | department | head | +-------------+-------+ | Accounting | Jones | | Engineering | Smith | +-------------+-------+
Third Paradigm (3NF)
To satisfy 2NF, all fields outside the non-primary key must be non-dependent on each other, i.e., you need to make sure that every column of data in the data table is directly related to the primary key and not indirectly.
In short, the Third Paradigm (3NF) requires that a relationship does not contain non-primary keyword information that is already contained in other relationships. For example, there exists a department information table, in which each department has a department number (dept_id), department name, department profile, and other information. Then after the department number is listed in the employee information table, the department name, department profile, and other information related to the department cannot be added to the employee information table. If the department information table does not exist, it should be constructed according to the third paradigm (3NF), otherwise there will be a lot of data redundancy.
3. On counter-paradigms
The advantages of a paradigm are obvious; it avoids a lot of data redundancy, saves storage space, and maintains data consistency. Paradigmatic tables are usually smaller and can be better placed in memory, so performing operations will be faster. So is the database design optimal once all the tables are normalized to 3NF? Not necessarily. The higher the paradigm means that the more granular the table division, the more tables are needed in a database, and the more users have to spread the originally related data across multiple tables. A slightly more complex query statement may require at least one correlation on a database that conforms to the paradigm, and perhaps more, which is not only expensive, but may also render some indexing strategies ineffective.
So when we do database design, we don't follow the paradigm requirements exactly and sometimes we do anti-paradigm design. By adding redundant or duplicate data to improve the read performance of the database, and reduce the number of join tables in the case of correlated queries.
The above is a detailed explanation of MySQL Database Paradigm, for more information about MySQL Database Paradigm please follow my other related articles!