1. Relational database
1. Concept
Relational database: refers to a database that uses a relational model to organize data. It is currently the most widely used database system among all types of databases. Simply put, a relational model refers to a two-dimensional tabular model. A relational database is a data organization composed of two-dimensional tables and their connections. The mainstream databases used now are relational databases, such as SQL Server, Mysql, Oracle, DB2, Sybase, etc.
Commonly used concepts in relational models:
Relationship: It can be understood as a two-dimensional table, each relationship has a relationship name, which is what is commonly referred to as the table name.
Tuple: It can be understood as a row in a two-dimensional table, and is often called a record in a database.
Attribute: It can be understood as a column in a two-dimensional table, which is often called a field in a database.
Domain: The value range of the attribute, that is, the value limit of a column in the database.
Keywords: A set of attributes that can uniquely identify tuples. It is often called a primary key in a database and consists of one or more columns.
Relational pattern: refers to the description of a relationship. Its format is: relation name (property 1, attribute 2...property N), which is called a table structure in the database.
2. Characteristics of relational databases
Relational databases are database systems that support relational models. The relationship model is a two-dimensional table that represents the connection between entities. Using two-dimensional tables to store data is intuitive and easier to understand for users. The advantages of using relational databases are mainly reflected in the following characteristics:
(1) Easy to operate. By developing applications and database connections, users can easily operate the data in the database, especially for people without a database foundation, they can also operate directly in the database through the database management system.
(2) Easily maintainable. Relational databases provide entity integrity, reference integrity and user-defined integrity in integrity constraints. Through integrity constraints, the redundancy of data storage and the probability of data inconsistency can be greatly reduced.
(3) Flexibility to access data. Objects such as views, stored procedures, triggers, indexes, etc. are provided in relational databases, making data access more flexible.
3. Bottlenecks of relational databases
(1) Requirements for high concurrent reading and writing of databases
Web2.0 websites need to generate dynamic pages and provide dynamic information in real time based on user personalized information. They cannot use dynamic page static technology. Therefore, the concurrent load of the database is very high, and it often needs to reach tens of thousands of read and write requests per second. At this time, the disk on the server cannot withstand so many read and write requests at all.
(2) Requirements for efficient storage and access of massive data
For large social networking websites, users generate massive user dynamics every day. As users continue to increase or decrease, there may be hundreds of millions of records in a data table. For relational databases, it is extremely inefficient to conduct SQL queries in a table with hundreds of millions of records.
(3) High scalability and usability
Among web-based structures, databases are the most difficult to scale horizontally. When the number of users and visits of an application system is increasing day by day, databases cannot simply expand performance and load capacity by adding more hardware and service nodes like web servers.
4. Relational data follows ACID principle
Transactions are transactions in English, which are very similar to transactions in the real world. They have the following four characteristics:
1. Atomicity
Atomicity is easy to understand, that is, all operations in a transaction must be done or not. The condition for a successful transaction is that all operations in the transaction are successful. As long as one operation fails, the entire transaction will fail and need to be rolled back. For example, when transferring money from account A, transferring 100 yuan to account B, is divided into two steps: 1) Withdraw 100 yuan from account A; 2) Deposit 100 yuan to account B. These two steps are either completed together or not. If only the first step is completed and the second step fails, the money will be inexplicably reduced by 100 yuan.
2. C (Consistency) Consistency
Consistency is also easier to understand, that is, the database must always be in a consistent state, and the operation of the transaction will not change the original consistency constraints of the database. For example, the existing integrity constraint a+b=10. If a transaction changes a, then b must be changed so that a+b=10 is still satisfied after the transaction is completed, otherwise the transaction will fail.
3. I (Isolation) Independence
The so-called independence means that concurrent transactions will not affect each other. If the data to which a transaction is accessed is being modified by another transaction, as long as another transaction is not committed, the data it accesses will not be affected by the uncommitted transaction. For example, there is an existing transaction that transfers 100 yuan from Account A to Account B. If the transaction has not been completed yet, if B checks his account at this time, he will not see the newly added 100 yuan.
4. D (Durability) Persistence
Persistence means that once a transaction is committed, the modifications it makes will be permanently saved on the database and will not be lost even if there is a downtime.
2. NoSQL database
NoSQL is used to refer to non-relational, distributed, and generally not guaranteed to follow ACID principles. NoSQL is sometimes called the abbreviation of Not Only SQL, and is a general term for database management systems that are different from traditional relational databases. NoSQL is used for storage of hyperscale data. (For example, Google or Facebook collects trillions of bits of data for their users every day). These types of data storage do not require a fixed pattern and can be scaled horizontally without unnecessary operations. Non-relational databases propose another concept, such as: storing in the form of key-value pairs, and the structure is not fixed. Each tuple can have different fields. Each tuple can add some of its own key-value pairs as needed, so that it will not be limited to a fixed structure, and can reduce some time and space overhead. In this way, users can add fields they need as needed. However, due to few constraints, non-relational databases cannot provide queries such as where provided by SQL for field attribute values. And it is difficult to reflect the integrity of the design. It is only suitable for storing some simple data. For data that requires more complex queries, SQL databases are more suitable for display.
1. Distributed system
A distributed system consists of multiple computers and software components for communications via a computer network connection (local network or wide area network).
A distributed system is a software system built on the network. It is precisely because of the characteristics of software that distributed systems are highly cohesive and transparent.
Therefore, the difference between network and distributed systems is more in high-level software (especially operating systems) than in hardware.
Distributed systems can be applied on different platforms such as: PC, workstation, LAN and WAN.
2. Advantages of distributed computing
Reliability (fault tolerance):
An important advantage in distributed computing systems is reliability. A system crash on one server does not affect the rest of the servers.
Scalability:
In distributed computing systems, more machines can be added as needed.
Resource Sharing:
Shared data is an essential application such as banking and booking systems.
flexibility:
Since the system is very flexible, it is easy to install, implement and debug new services.
Faster speed:
A distributed computing system can have the computing power of multiple computers, making it faster processing speeds than other systems.
Open system:
Since it is an open system, the service can be accessed locally or remotely.
Higher performance:
Compared with centralized computer network clusters, it can provide higher performance (and better cost-effectiveness).
3. Disadvantages of distributed computing
troubleshooting:
Troubleshoot and diagnose problems.
software:
Less software support is a major disadvantage of distributed computing systems.
network:
Problems with network infrastructure, including: transmission problems, high load, information loss, etc.
Security:
The characteristics of the development system make distributed computing systems have problems such as data security and sharing risks.
4. Comparison between relational database and non-relational database
4.1. Relational database
- Highly organized structured data
- Structured Query Language (SQL) (SQL)
- Data and relationships are stored in separate tables.
- Data manipulation language, data definition language
- Strict consistency
- Basic transactions
4.2、NoSQL
- means more than SQL
- No declarative query language
- No predefined pattern
-Key - Value pair storage, column storage, document storage, graph database
- Final consistency, not ACID attributes
- Unstructured and unpredictable data
- CAP Theorem
- High performance, high availability and scalability
The biggest feature of relational databases is transactional consistency: traditional relational database read and write operations are transactional and have the characteristics of ACID. This feature makes relational databases available in almost all systems that require consistency, such as typical banking systems.
However, in web applications, especially in SNS applications, consistency is not so important. It is tolerated that the content seen by user A and user B sees the content update of the same user C, or in other words, the time difference between the data update of the same friend is tolerated in a few seconds. Therefore, the biggest feature of a relational database is no longer useful here, at least it is not so important.
On the contrary, the huge price paid by relational databases to maintain consistency is that their read and write performance is relatively poor. SNS applications such as Weibo and Facebook have extremely high requirements for concurrent read and write capabilities, and relational databases are no longer able to cope with (in terms of reading, in order to overcome the defects of relational databases and improve performance, they have added a level of memcache to static web pages, but in SNS, the change is too fast and memchache is powerless). Therefore, a new data structure storage must be used instead of relational databases.
Another feature of a relational database is that it has a fixed table structure, so its scalability is extremely poor. In SNS, the upgrade of the system and the increase in functions often mean huge changes in the data structure, which is difficult to cope with and requires new structured data storage.
Therefore, non-relational databases emerged. Since it is impossible to use a structured data storage to deal with all new needs, non-relational databases are strictly not a database, but should be a collection of data structured storage methods.
It must be emphasized that persistent storage of data, especially persistent storage of massive data, still requires the use of relational databases.
5. Advantages/disadvantages of NoSQL
advantage:
- - High scalability
- - Distributed Computing
- - low cost
- - Architectural flexibility, semi-structured data
- - No complicated relationship
shortcoming:
- - No standardization
- - Limited query functionality (so far)
- - The ultimate consistency is not intuitive program
This is what this article introduces on the overview of relational and non-relational databases and the advantages and disadvantages of relational and non-relational databases. I hope it will be helpful to everyone's learning and I hope everyone will support me more.