There are no full-time database operation and maintenance personnel in startups or small and medium-sized companies. They are generally maintained by technical leaders or operation and maintenance personnel. Medium and large companies will have some great people who specialize in databases. Their special position is called DBA. The value of the company's DBA should not be underestimated. As long as the company has a little scale, it must do a good job in the construction and maintenance of databases. This also reflects the importance of DBAs and also means that DBAs have a heavy responsibility.
If a company's database never has problems, it must be because of small traffic or no business volume. Problems are also the best opportunity to learn. How to fix problems as quickly as possible when problems arise is crucial.
1. The most common questions about databases
1.1. Database selection
For most technicians, it is basically no need to decide which database the company uses. When joining a company, most of the technical selection has been determined long ago, unless you are the technical person or director of the company just started a business, you may determine the technical selection. Especially for databases, the cost (learning, trial and error) is very high once they are selected. Therefore, unless there are disruptive technologies or technical problems that are difficult to overcome at present, few companies will make big migrations time-consuming and laborious.
In fact, whether it is technology selection or technology transformation, there is a factor that cannot be ignored is whether technicians can master this technology, or that technicians and technical leaders are familiar with this technology.
Because each company has different businesses and the application scenarios of database systems are different, each company may use multiple databases for different business scenarios, so the selection is also different, but it is certain that it is definitely the best without that system.
For example:
Payment-related businesses require strong transactions and strong consistency support: open source Mysql, PostgreSQL, and new distributed TiDB
, paid oracle, etc.
Log Analysis Business Search Related: Elasticsearch
Cache, flash sale related: Redis, Cuchbase, Memcached
Big data analysis: Hive, Cassandra, clickhouse
Document database: Mongodb
Graph database: TigerGraph, Neo4j, Amazon Neptune, JanusGraph
This is why many companies choose multiple databases to coexist and provide optimal support for related business scenarios through different technologies and architectures. Once the selection error occurs, frequent problems will occur, which will directly affect the user experience.
All technology selection and design have their application scenarios, which is why many companies choose multiple databases to coexist and provide optimal support for related business scenarios through different technologies and architectures. Once the selection error occurs, frequent problems will occur, which will directly affect the user experience.
1.2. Database-related architecture
When it comes to architecture, people usually think of:
Single instance: easy to understand, single library concept.
Vertical grouping: add replicas for single instances, such as Mysql master-slave synchronization, read and write separation, Mongodb's replica set, read and write separation.
Horizontal sharding: For example, writing a book is very thick and tens of thousands of pages. For the sake of easy reading, the authors are divided into books (10 small books), such as Tidb distributed relational data, redis cluster, Mongodb sharding, etc.
The architecture also includes the design of the cache system at the upper level of the database, the processing of database connections by the program language, the functions of the proxy layer, and the construction of data pipelines related to secondary logs, including the design of partitioning and backup of the data system.
Like in the past, many companies put all their data in one database (we were the same as we were at the beginning, all of which were placed in the oracle database, with the oracle data reaching 8TB, and the maximum number of connections in the database was 3,000). As the business continued to grow, various problems began to arise, because of the limitations of various connection pools and throughput, there was basically no way to expand. We started to design the splitting of data targets, put the relevant data together, put the less related ones in another library, and even put it in the document-based database Mongodb, gradually alleviating the problem of scalability.
With the popularity of microservices, databases must also be split. No business-related can be split into different databases as small as possible. Programs can be called through the dubbo interface, independent of each other and do not affect each other.
1.3. Database bottleneck
Friends who have configured Nginx services know that when the current web service traffic increases, you only need to add nginx instances to modify the configuration, but when there is a bottleneck in the database, you may not be able to solve it by directly adding an instance.
As long as it is a database, there are throughput restrictions. The bottleneck of database access is caused by natural traffic growth or sudden traffic increase. As long as the business grows, one day the database access will reach an online visit. Before this problem comes, you need to perform various vertical splits or horizontal expansions to increase the entire upper limit (a lot of work to do), or you can diversion through caches such as (CDN, nginx, redis) or other mechanisms.
For example, traffic suddenly increased: it may be an attack or a promotion activity, such as limited-time purchases and flash sale.
To attack, we need to do some stress tests in advance or access caches to reduce the pressure on the database.
For promotion activities, we need to communicate with business product operations in advance.
Index: Don't over-rely rely on indexes, invalid indexes may be worse
Transactions: associated transactions, distributed transactions, and try to break them into small transactions to execute
1.4. Human errors encountered
No matter how good the system is, it will be in vain if it is not the right way. What's more, not all engineers are database experts, and all human errors are the most common problems.
1.4.1. Human errors can be divided into two types:
Mistakes made when operating the database
Mistakes made by executing programs or scripts
The probability of making mistakes when operating the database is relatively low, but the impact is great. Almost all companies will have similar legendary events, which can be summarized in these categories;
1.4.1.1. Intentionally or unintentionally delete the core data of the database:
The example at the beginning of the article is also written. Deliberately deleting company data will cause immeasurable economic losses to the company, and will also ruin your career as an engineer and face jail.
If you make mistakes when operating the database, you must restore the data as soon as possible and recover the losses as much as possible. A long time ago, I operated the database at around 12 a.m. (It is common for overtime in the IT industry to delete the test database). When I deleted the database, I couldn't connect to the online database and performed the deletion operation. After the execution was completed, I received the alarm information and my mind was clear in an instant. Fortunately, the recovery was timely and the loss was not too great. The operation was changed to 5 a.m., and abnormalities could be discovered in time (you may go to bed after the operation is completed at night).
1.4.1.2. Misoperation in modifying the table structure:
The engineer launched a new version, and added a modified table structure operation to the code, resulting in the database being locked for up to several hours, and the service abnormality occurred in this business line for several hours.
The engineer modify the table structure without adding where conditions and other operations, resulting in the data in the table being overwritten.
1.4.1.3. Misoperation of indexing problems:
In a Java underlying framework, when initializing, there is an operation to add indexes by default, and it is pre-release again. During online testing, the online database is directly locked for several hours (the importance of environmental isolation).
Querying data without index in a program or script results in a scan of the entire table. In addition, some interfaces call to access each other, all connections to the database are often occupied, resulting in the query being unable to be executed.
1.4.1.4. Physical misoperation:
When the computer room is powered off or the network is out of power. Remember when you first work, you put the server on the shelves in the company's computer room and unplugged the power plug in the wrong way, resulting in the unavailable test environment service (the office room has the test development environment, and the formal environment is hosted in the operator's computer room). Although it does not affect the online business, this matter is always remembered.
1.5. Data security
Data security includes three aspects: CIA (Confidentiality, Integrity, Availability)
(1) Confidentiality: Unauthorized users are not allowed to access information;
(2) Integrity: Only authorized users are allowed to modify data;
(3) Availability: Authorized users can access data.
Physical security: Ensure the security of storage physical equipment and do not be struck or stolen by lightning
Firewall: restricts database access, such as the port that allows that route to access the database
Operation and maintenance safety: Carry out process management of operation and maintenance behavior, providing functions such as pre-approval, in-process control, post-audit audit, regular reporting, etc., effectively combining approval, control and accountability to avoid malicious and misoperable operations by internal operation and maintenance personnel.
Database audit: record database activities on the network in real time and conduct fine-grained audits on database operations.
Data encryption: Encrypt data from a technical perspective. Data encryption can effectively prevent internal data leakage caused by plain text storage and data theft by high-authority users, and prevent sensitive data leakage from the root
Vulnerability scanning: Vulnerability scanning through database and code can effectively evaluate security vulnerabilities and threats in database systems and provide repair suggestions.
Data backup: When an abnormality occurs in the database, it can be restored in time, deleted incorrectly, and the computer room is powered off, resulting in data corruption
2. Which database should novices learn first when entering the industry?
It is recommended to start with open source MySQL (80% to 90% of Internet companies are using it, and the salary is higher than that of OracleDBA (capital concentration), and it is relatively easy to get started). After getting familiar with mysql, any relational database system is the same. After mastering MySQL, other relational databases are no problem. Non-relational databases are relatively simple, and some concepts are also the same.
3. Continuous learning
In fact, it is not just DBA, but the entire operation and maintenance system is changing. The traditional gameplay is low in efficiency and repetitive work, which is no longer applicable to the high-speed and fast-paced development of Internet companies. Therefore, the content of DBA work needs to be changed, and the ideas also need to be changed. We should develop in terms of how to quickly deliver, improve the overall stability of the database, efficient operation and maintenance, create more database product solutions suitable for business needs, and even develop output data architecture solutions.
4. Frontier Database Technology
For example, DB automation management platform
Host management: host addition, deletion and modification, host resources (cpu, disk, men)
Instance management: instance addition, deletion, and instance master and slave switching
Audit management: sql audit, sql rollback
Log management: binlog, slow query
Task Management: Timing Tasks
Daily maintenance: batch SQL execution, configuration modification, automatic capacity expansion
This is the article about the work summary of the database operation and maintenance personnel DBA. For more relevant database operation and maintenance work summary content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!