Problem phenomenon
The java service has reported a large number of error logs. You can see the appendix for details. The error is basically reported in the following aspects.
- Caused by: : Read timed out
- Caused by: : No operations allowed after connection closed.
- The last packet successfully received from the server was 10,003 milliseconds ago. The last packet sent successfully to the server was 10,003 milliseconds ago.
- Cause: : Deadlock found when trying to get lock; try restarting transaction
- Cause: : Lock wait timeout exceeded; try restarting transaction
Speculation 1 - Client retry configuration without error
Due to the 1st, 2nd and 3rd day checks, most of the logs that appear are Read timed out and No operations allowed after connection closed, and the larger problem isAfter an error is reported, the service enters a fake death and cannot be used unless it is restarted.
Initially suspected that there is a problem with the client connection pool or the configuration of obtaining MySQL connection
The connection pool has been adjusted, adding timeout and reconnect parameters:
@Override public DataSource getDataSource() { DruidDataSource dataSource = new DruidDataSource(); (("url")); (("k1")); (("k2")); (30); (5); (60); (3); (60000); (false); (-1); ("select 'x'"); (30000); (false); (false); (true); (true); (300000); (60000); (true); (10); (1000); Properties connectProp = new Properties(); ("", "true"); ("", "5000"); (connectProp); try { (); } catch (SQLException e) { ((), e); } return dataSource; }
The MySQL connection is adjusted as follows, and add allowPublicKeyRetrieval:
...&serverTimezone=Asia/Shanghai&allowPublicKeyRetrieval=true";
Result: The error is still reported, not resolved.However, the problem of service fake death was solved. After an error is reported, if only a small number is performed, such as 2 concurrency, the service is available.
Speculation 2 - Client timeout time is too short
It is suspected that it is a MySQL performance problem. It takes so long for SQL to operate. Try to increase the client's timeout time and try again.
@Override public DataSource getDataSource() { DruidDataSource dataSource = new DruidDataSource(); (("url")); (("k1")); (("k2")); (60); (5); (105); (30); (60000); (false); (-1); ("select 'x'"); (60000); (false); (false); (true); (true); (300000); (60000); (true); (10); (1000); (150000); (150000); (150000); Properties connectProp = new Properties(); ("", "true"); ("", "5000"); (connectProp); try { (); } catch (SQLException e) { ((), e); } return dataSource; }
mybatis configuration:
<!-- Configuration --> <settings> <setting name="mapUnderscoreToCamelCase" value="true"/> <setting name="useGeneratedKeys" value="true"/> <setting name="defaultStatementTimeout" value="60"/> </settings>
Result: The error is still reported, not resolved.But the timeout time is long, and the number of successes has increased.
Speculation 3 - MySQL version issue
The problem online is MySQL 8, pull it locally for testing
docker pull /vwf-base/mysql:8.0.31 docker run -it \ -p 33305:3306 \ -e MYSQL_ROOT_PASSWORD='123456' \ --volume /tmp/docker-cps/wf-base/tmp/mysqldata:/var/lib/mysql /vwf-base/mysql:8.0.31
Result: The error is still reported, not resolved.5.7 is the same as 8.0 error message
Speculation 4 - The concurrency number of client connection pool is too low
Increase the maximum number of threads to druid to 105, and the default maximum number of connections to MySQL is 151 (show variables like '%max_connection%';
)
When starting concurrency, continue to run the following instructions in MySQL.show status like 'Threads%';
, monitor the number of concurrent threads (Threads_connected)
**Result: Still reported an error, not resolved. **The concurrency will not exceed 60, which means that the maximum 60 concurrency configured in druid is fully met with performance requirements.
Speculation 5 - MySQL service performance is low
Do pressure tests on MySQL
First homebrew install sysbench
Create a database benchmark for MySQL and prepare data prepare
sysbench \ --test='/usr/local/Cellar/sysbench/1.0.20_3/share/sysbench/tests/include/oltp_legacy/' \ --oltp-tables-count=1 \ --report-interval=10 \ --oltp-table-size=1000000 \ --mysql-user=root \ --mysql-password=123456 \ --mysql-table-engine=innodb \ --rand-init=on \ --mysql-host=127.0.0.1 \ --mysql-port=13336 \ --mysql-db=benchmark \ --time=300 \ --max-requests=0 \ --oltp_skip_trx=on \ --oltp_auto_inc=off \ --oltp_secondary=on \ --threads=50 \ prepare
After finishing, change prepare to run, that is, start running pressure measurement on the local computer and counting MySQL performance data
The benchmark of 243 is as follows:
- transactions: 26872 (tps 89.45 per sec.)
- queries: 483696 (qps 1610.09 per sec.)
The local MySQL process (non-docekr) is as follows:
- transactions: 185025 (tps 616.36 per sec.)
- queries: 3330450 (qps 11094.40 per sec.)
MySQL running local docker is as follows:
- transactions: 18324 (tps 60.97 per sec.)
- queries: 329832 (qps 1097.50 per sec.)
It can be found that the performance of non-docker running is far higher than that of MySQL running by 243 and docker running, with performance above 6 times. Because the performance is very good, the bug is not reproduced, while the performance of mysql running by docekr is generally, so the bug can be reproduced.
Finally, the machine operates and maintains a 48C + 128G, and runs a mysql service separately, with the performance as follows:
48C + 128G MySQL service performance:
- transactions: 86301 (tps 287.42 per sec.)
- queries: 1553418 (qps 5173.49 per sec.)
The performance of this server is almost 3 times that of 243. In the R&D environment, migrate the database of the java service to this MySQL and try again
Result: The error is still reported, not resolved.At this point, it is concluded that the performance of MySQL is not a problem
Speculation 6 - Client code not closed resources
The client, that is, the java service, is currently using Mybatis to obtain things made by SqlSession. According to the official website, you can know that SqlSession is thread-unsafe; on the other hand, there may be some places where SqlSession is not closed. If it is not closed, it will cause hang and may cause serious consequences.
Two solutions are done to address the highlights of the official website document:
- Make the SqlSession of the class variable into a local variable of the method
- Each SqlSession is made using try-with-resource
Finally start the test
Result: The error is still reported, not resolved.
Speculation 7 - Client code leads to long transactions
Judging from the error log of the java service, the constant error is caused by the DELETE timeout
On the other hand, after each error, from the transaction information of MySQL, you can monitor that the DELETE statement is in LOCK WAIT
- 1、Query lock information select * from `sys`.`innodb_lock_waits`; -- 2、Query lock information select * from `performance_schema`.data_locks; -- 3、Query lock waiting information select * from `performance_schema`.`data_lock_waits`; -- 4、Query transaction information select * from `information_schema`.innodb_trx; -- 5、Query event information select * from `performance_schema`.`events_statements_history`; -- 6、View the current lock holding statement SELECT * FROM performance_schema.events_statements_history WHERE thread_id IN( SELECT b.`THREAD_ID` FROM sys.`innodb_lock_waits` AS a , performance_schema.threads AS b WHERE a.`blocking_pid` = b.`PROCESSLIST_ID`) ORDER BY timer_start ASC; -- 7、View the currently locked statement SELECT * FROM performance_schema.events_statements_history WHERE thread_id IN( SELECT b.`THREAD_ID` FROM sys.`innodb_lock_waits` AS a , performance_schema.threads AS b WHERE a.waiting_pid = b.`PROCESSLIST_ID`) ORDER BY timer_start ASC; -- 8、Check out the last deadlock log(Status The value of the field) SHOW ENGINE INNODB STATUS;
On the other hand, the last oneSHOW ENGINE INNODB STATUS;
The deadlock log can be found
Here is analyzing the deadlock log
TRANSACTION 1883862 Running sentence:DELETE FROM `tb_entity` WHERE `directory` LIKE concat('cloud_data/133/task_k35846zp', '%') TRANSACTION 1883862 Record lock held:【space id 2 page no 309 n bits 152;heap no 42 PHYSICAL RECORD】 TRANSACTION 1883862 Wait for record lock:【space id 2 page no 303 n bits 152;heap no 72 PHYSICAL RECORD】 === TRANSACTION 1883861 Running sentence:DELETE FROM `tb_entity` WHERE `directory` LIKE concat('local_data/134/sampling_output_uqlv65t2', '%') TRANSACTION 1883861 Record lock held:【space id 2 page no 303 n bits 152;heap no 72 PHYSICAL RECORD(There are many here,Omit other...)】 TRANSACTION 1883861 Waiting for record lock:【space id 2 page no 309 n bits 152;heap no 42 PHYSICAL RECORD】
Judging from the deadlock log, two transactions want to acquire the locks held by each other, and a deadlock occurs
In other words, although these are two different SQL statements and the records produced by LIKE do not have intersections, it will also cause deadlocks.This is because for the operations of the LIKE update class (UPDATE, INSERT), MySQL transactions will acquire record locks for all records. In the case of concurrent transactions, it is easy to generate deadlocks.
Solution
Here we know the final reason. In case of large concurrency, the java service DELETE statement will acquire all records due to LIKE. At this time, concurrent competition for transactions occurs, resulting in deadlocks.
- Solution 1. Transform the DELETE statement on the java server, query the ids that need to be deleted, and delete them in batches in ids; but it is recommended to select ids limit n and in ids to delete them, because there may be many ids to find them all at once; here, it is avoided that if multiple DELETE transactions acquire concurrently, all record locks are obtained, resulting in deadlocks.This solution can solve the fundamental problem
- Scheme 2. Serialize the concurrent statements at the business level. For example, the execution of this DELETE transaction statement can only solve the problem of DELETE transactions in this scenario. If there are other UPDATE transactions and other DELETE transactions that have this problem, it also needs to be changed.This solution can solve the problem, but it is quite troublesome. It requires business adaptation to modify the code, which will reduce the transaction performance of MySQL.
- Solution 3: Improve MySQL configuration performance, reduce transaction execution time, and reduce transaction concurrent competition time.This solution may not meet the implementation conditions in reality and cannot fundamentally solve the problem
- Scheme 4. Set the transaction level of MySQL to serial, the default level is RR, and it is set to serial, and there is no transaction concurrency problem.This solution can solve the problem, but MySQL's transaction performance is greatly reduced
Summarize
Do not use LIKE in DELETE and UPDATE
First SELECT id WHERE … LIKE … LIMIT N, then use DELETE / UPDATE … IN (ids) to do it
Check the deadlock log when the operation timeout
View the most recent deadlock log (value of the Status field): SHOW ENGINE INNODB STATUS; Focus on the possibility of problems in the business code of SQL in the deadlock log
Concurrency issues focus on data estimation and performance
The data scale is not large, or the service performance is excellent, and the probability of concurrency problems is low
Currently, according to pressure measurement
Collapse source code
sysbench \ --test='/usr/local/Cellar/sysbench/1.0.20_3/share/sysbench/tests/include/oltp_legacy/' \ --oltp-tables-count=1 \ --report-interval=10 \ --oltp-table-size=1000000 \ --mysql-user=root \ --mysql-password=123456 \ --mysql-table-engine=innodb \ --rand-init=on \ --mysql-host=127.0.0.1 \ --mysql-port=13336 \ --mysql-db=benchmark \ --time=300 \ --max-requests=0 \ --oltp_skip_trx=on \ --oltp_auto_inc=off \ --oltp_secondary=on \ --threads=50 \ run
The database table is 3w data, each DELETE transaction hits 1000 data, and 20 transactions are sent concurrently:
MySQL performance is as follows, with a probability of about 70% triggering deadlock:
- transactions: 18324 (tps 60.97 per sec.)
- queries: 329832 (qps 1097.50 per sec.)
The performance of mysql is as follows, and there is no deadlock reproduction:
- transactions: 185025 (tps 616.36 per sec.)
- queries: 3330450 (qps 11094.40 per sec.)
But mysql may not have such conditions in reality
Summarize
This is the article about the troubleshooting and solutions of MYSQL transaction deadlock problems. For more related MYSQL transaction deadlock problems, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!