Database design skills[transfer]

1. The relationship between the original document and the entity
It can be a one-to-one, one-to-many, and many-to-many relationship. In general, they are a one-to-one relationship: that is, a primitive document corresponds to and only one entity. In special cases, they may be a one-to-many or many-to-one relationship, that is, one original document corresponds to multiple entities, or multiple original documents correspond to one entity. The entities here can be understood as basic tables. After clarifying this correspondence, it will be of great benefit to our design of the input interface.

〖Example 1〗: In the human resources information system, it corresponds to three basic tables: employee basic situation table, social relationship table, and work resume table. This is a typical example of "one original document corresponds to multiple entities".

2. Primary and Foreign Keys
Generally speaking, an entity cannot have neither primary key nor foreign key. In the E-R diagram, entities in the leaf site can define primary keys or not (because it has no descendants), but must have foreign keys (because it has fathers).
The design of primary and foreign keys occupies an important position in the design of global databases. After the design of the global database was completed, an American database design expert said: "Keys are everywhere, and there is nothing except keys." This is his experience in database design, and it also reflects his highly abstract idea of the core of information system (data model). Because: the primary key is a highly abstract entity, the pairing of the primary key and the foreign key represents the connection between the entities.

3. The nature of the basic table
Basic tables are different from intermediate tables and temporary tables because they have the following four characteristics:
(1) Atomicity. Fields in the base table are non-decomposed.
(2) Primitiveness. The records in the basic table are records of the original data (basic data).
(3) Deductive. All output data can be derived from the data in the basic table and the code table.
(4) Stability. The structure of the basic table is relatively stable, and the records in the table must be saved for a long time.
After understanding the nature of basic tables, when designing a database, you can distinguish basic tables from intermediate tables and temporary tables.

4. Paradigm Standards
The relationship between the basic table and its fields should be met as much as possible. However, database design that meets the third paradigm is often not the best design. In order to improve the operation efficiency of the database, it is often necessary to reduce the paradigm standard: appropriately increase redundancy to achieve the goal of exchanging space for time.
〖Example 2〗: There is a basic table for storing goods, as shown in Table 1. The existence of the field "Amount" indicates that the design of the table does not meet the third normal form, because "Amount" can be obtained by multiplying the "unit price" by "quantity", indicating that "Amount" is a redundant field. However, adding the redundant field of "amount" can increase the speed of query statistics, which is the method of exchanging space for time.
In Rose 2002, there are two types of specified columns: data columns and computed columns. Columns such as "amount" are called "calculated columns", while columns such as "unit price" and "quantity" are called "data columns".
Table 1 Table structure of product table
Product Name Product Model Unit Price Quantity Amount
TV 29 inches 2,500 40 100,000

5. Understand three paradigms in a popular way
Understand three paradigms in a popular way，It's great for database design。 In database design, in order to better apply the three paradigms, it is necessary to understand the three paradigms in a popular way (common understanding is sufficient, not the most scientific and accurate understanding):
The first normal formula: 1NF is an atomic constraint on attributes, requiring attributes to be atomic and cannot be decomposed;
The second normal form: 2NF is a uniqueness constraint on records, requiring records to have a unique identity, that is, the uniqueness of entities;
The third normal form: 3NF is a constraint on the redundancy of fields, that is, any field cannot be derived from other fields, and it requires that the field has no redundancy.
No redundant database design can be done. However, a database without redundancy may not be the best database. Sometimes, in order to improve operational efficiency, the paradigm standards must be lowered and redundant data must be properly retained. The specific approach is: to abide by the third paradigm when designing conceptual data models, and to consider the work of reducing paradigm standards when designing physical data models. Reducing the paradigm is to add fields to allow redundancy.

6. Be good at identifying and correctly handling many-to-many relationships
If there is a many-to-many relationship between two entities, this relationship should be eliminated. The way to eliminate it is to add a third entity between the two. In this way, the original many-to-many relationship has now become two one-to-many relationships. The properties of the original two entities should be reasonably assigned to three entities. The third entity here is essentially a more complex relationship, which corresponds to a basic table. Generally speaking, database design tools cannot identify many-to-many relationships, but can handle many-to-many relationships.

〖Example 3〗: In the "Library Information System", "book" is an entity, and "reader" is also an entity. The relationship between these two entities is a typical many-to-many relationship: a book can be borrowed by multiple readers at different times, and a reader can borrow multiple books. To this end, a third entity is added between the two, which is named "Borrow and Return Book". Its attributes are: Borrow and Return Time, Borrow and Return Flag (0 means borrowing, 1 means returning Book). In addition, it should also have two foreign keys (the primary key of "book" and the primary key of "reader") so that it can be connected to "book" and "reader".

7. The value method of primary key PK
PK is a table connection tool for programmers to use. It can be a string of numbers with no physical meaning, and is automatically added to 1 by the program. It can also be a physically meaningful field name or a combination of field names. But the former is better than the latter. When PK is a combination of field names, it is recommended that the number of fields should not be too many. If there are too many, not only will the index take up a lot of space, but it will also take up a slow speed.

8. Correctly understand data redundancy
The recurrence of primary and foreign keys in multiple tables does not belong to data redundancy. This concept must be clear, but in fact many people are not clear about it. The recurrence of non-key fields is the data redundancy! And it is a low-level redundancy, that is, repetitive redundancy. Advanced redundancy is not a recurring occurrence of fields, but a derived occurrence of fields.

〖Example 4〗: In the three fields of "unit price, quantity, and amount" in the product, "amount" is derived from "unit price" multiplied by "quantity". It is redundancy, and it is a kind of advanced redundancy. The purpose of redundancy is to increase processing speed. Only low-level redundancy will increase data inconsistency, because the same data may be entered multiple times from different times, places, and roles. Therefore, we advocate high-level redundancy (derived redundancy) and oppose low-level redundancy (repeat redundancy).

9. There is no standard answer for E–R diagrams
There is no standard answer to the E-R diagram of information systems, because its design and drawing method are not unique, as long as it covers the business scope and functional content of the system requirements, it is feasible. On the contrary, we need to modify the E–R diagram. Although it does not have the only standard answer, it does not mean that it can be designed at will. The standards for good E-R diagrams are: clear structure, concise correlation, moderate number of entities, reasonable attribute allocation, and no low-level redundancy.

10. View technology is useful in database design
Unlike basic tables, code tables, and intermediate tables, views are virtual tables that exist based on the real tables of the data source. View is a window for programmers to use the database, a form of base table data synthesis, a method of data processing, and a means of confidentiality of user data. In order to perform complex processing, improve computing speed and save storage space, the depth of the view definition must generally not exceed three layers. If the three-layer view is still not enough, a temporary table should be defined on the view and a temporary table should be defined. By overlapping the definitions repeatedly, the depth of the view will be unlimited.

The role of view is even more important for certain information systems related to the political, economic, technological, military and security interests of the country. After the basic tables of these systems are physically designed, the first layer of view is immediately established on the basic table. The number and structure of this layer of views are exactly the same as the number and structure of the basic table. It is also stipulated that all programmers are allowed to operate on the view only. Only the database administrator, with the "security key" that multiple personnel have jointly mastered, can operate directly on the basic table. Please think about it: Why is this?

11. Intermediate tables, reports and temporary tables
An intermediate table is a table that stores statistics. It is designed for data warehouses, output reports or query results. Sometimes it does not have primary keys and foreign keys (except data warehouses). Temporary tables are designed by programmers and store temporary records for personal use. The base table and intermediate table are maintained by the DBA, and the temporary tables are automatically maintained by the programmer themselves with programs.

12. Integrity constraints are manifested in three aspects
Domain integrity: Use Check to implement constraints. In the database design tool, when defining the value range of a field, there is a Check button to define the value of the field through it. Reference integrity: implemented with PK, FK, and table-level triggers. User-defined integrity: It is some business rules implemented with stored procedures and triggers.

13. The method to prevent patching of database design is the "Three Less Principle"
(1) The fewer tables in a database, the better. Only when the number of tables is small can the system's E-R diagrams be small and refined, removing duplicate unnecessary entities, forming a high degree of abstraction of the objective world, and performing systematic data integration, preventing patched design;
(2) The fewer fields in a table that combine primary keys, the better. Because the role of the primary key is to create the primary key index, and the other is to be used as a foreign key to the subtable, so the number of fields that combine the primary key is smaller, which not only saves running time, but also saves index storage space;
(3) The fewer the number of fields in a table, the better. Only when the number of fields is small can it mean that there is no data duplication in the system and there is very little data redundancy. More importantly, it is to urge readers to learn to "column-change rows", which prevents the fields in the sub-table from being pulled into the main table, leaving many spare fields in the main table. The so-called "column-changing row" means pulling out part of the content in the main table and creating a separate subtable. This method is very simple. Some people just don’t get used to it, do not adopt it, or implement it.
The practical principle of database design is to find the right balance between data redundancy and processing speed. "Three Young Masters" is a holistic concept, a comprehensive viewpoint cannot be isolated from a certain principle. This principle is relative, not absolute. The principle of "three more" is definitely wrong. Just imagine: if the same function of the system is covered, the E-R diagram of one hundred entities (a total of one thousand attributes) is definitely much better than the E-R diagram of two hundred entities (a total of two thousand attributes).
The principle of "three young people" is to encourage readers to learn to use database design technology to conduct systematic data integration. The step of data integration is to integrate the file system into an application database, integrate the application database into a topic database, and integrate the topic database into a global comprehensive database. The higher the degree of integration, the stronger the data sharing, the less information island phenomenon, and the fewer the number of entities, the number of primary keys, and the number of attributes in the global E-R diagram of the entire enterprise information system will be.
The purpose of advocating the "three few" principle is to prevent readers from using patching technology to continuously add, delete and modify the database, making the enterprise database a "garbage pile" for randomly designing database tables, or a "major house" for database tables, and finally causing the basic tables, code tables, intermediate tables, and temporary tables in the database to be messy and numerous, resulting in the inability to maintain the information system of enterprises and institutions and paralyzed.
Anyone can do the "three more" principle, which is the fallacy theory of designing databases with the "patch method". The "Three Less" principle is a principle of less but preciseness. It requires high database design skills and art, which not everyone can do, because this principle is the theoretical basis for eliminating the use of the "patch method" to design databases.

14. Methods to improve database operation efficiency
Under given system hardware and system software conditions, the way to improve the operation efficiency of the database system is:
(1) When designing database physically, reduce the paradigm, increase redundancy, use less triggers, and use more stored procedures.
(2) When the calculation is very complex and the number of records is very huge (for example, ten million), the complex calculation must be outside the database first, and then the calculation process is completed in C++ language in the file system, and then finally it is added to the table. This is the experience in the design of telecom billing systems.
(3) If you find that there are too many records in a certain table, such as more than 10 million, you need to split the table horizontally. The horizontal segmentation method is to use a value of the primary key PK of the table as the boundary line to divide the records of the table horizontally into two tables. If you find that there are too many fields in a certain table, such as more than eighty, then split the table vertically and decompose the original table into two tables.
(4) System optimization of the database management system DBMS, that is, optimizing various system parameters, such as the number of buffers.
(5) When using data-oriented SQL language for programming, try to adopt optimization algorithms.
In short, to improve the operation efficiency of the database, we must work hard at the same time from the three levels: database system-level optimization, database design-level optimization, and program implementation-level optimization.

The above fourteen techniques are gradually summarized by many people in a large number of database analysis and design practices. Readers should not use these experiences to memorize them by rote, but should digest and understand them, seek truth from facts, and master them flexibly. And gradually achieve: develop in application and apply in development.