SoFunction
Updated on 2025-03-10

Record and troubleshoot PHP script execution stuck

Discover problems

Recently, I suddenly found from the monitoring that the load of one machine we serve is higher than that of other machines in the same computer room, and there is no difference in inflow and outflow. Further checking found that each machine room has the same phenomenon. After sorting out, I found that these machines in problems ran more PHP scripts than normal machines, so I guessed that there was a problem with the execution script.

Solve the problem

After logging in to the machine, I found that there was a PHP process with a high CPU occupancy. Then I executed the following command and found that there was a PHP script that was started by crontab that had been executed for a long time:

ps aux | grep 'php' | grep -v 'php-fpm'

Since I had encountered similar situations where PHP script execution was stuck before, I suspected that the Mysql query across the computer room caused the Mysql connection to be stuck during network jitter, so I naturally killed all the stuck processes, and then the machine immediately returned to normal from the load, so I ran to do something else with satisfaction.

After a while, I brushed the monitoring and found that the problem occurred again. After commenting out the crontab and killing the process, I manually executed the problem script and could actually reproduce the problem stably! It seems that I have thought the problem too simply. Try to use the strace command to see what the stuck process is doing:

[tabalt@localhost ~] sudo strace -p 13793
Process 13793 attached - interrupt to quit

No output! Then use netstat to see if this process opens any port:

[tabalt@localhost ~] sudo netstat -tunpa | grep 13793
tcp  0  0 192.168.1.100:38019  192.168.1.101:3306  ESTABLISHED 13793/php
tcp  0  0 192.168.1.100:47107  192.168.1.102:6379  CLOSE_WAIT 13793/php 

You can see that the process has opened two ports, which have established a connection with Mysql and Redis, and is in a state where the connection is established (ESTABLISHED) and the other party actively closes the connection (CLOSE_WAIT); at first glance, it seems that the connection with the database is stuck, but because of the loss and being too fooled, we use tcpdump to catch packets to see the interaction between the process and the database:

tcpdump -i eth0 host 192.168.1.101 and port 3306 -w ~/

After catching it for a while, there was no output in the ~/ file. Is there no interaction between the process and Mysql? Then why is the connection establishment not closed? It seems that I can only track the execution of the script from scratch:

First of all, in order to have time to strace the process, output the process's pid at the beginning of the PHP script and sleep 10s:

 echo getmypid(); sleep(10); 

Then start the interaction process between tcpdump and Mysql when it is ready to capture packets.

Finally, execute the PHP script, copy the output pid and execute the strace command in a new window.

Now strace and tcpdump have content! Judging from the strace result, there is no poll after recvfrom, but I don't see anything wrong:

//...
poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 1471228928) = 1 ([{fd=4, revents=POLLIN}])
recvfrom(4, ":///\0\0\23jiadia"..., 271, MSG_DONTWAIT, NULL, NULL) = 271
poll([{fd=4, events=POLLIN|POLLERR|POLLHUP}], 1, 1471228928) = 1 ([{fd=4, revents=POLLIN}])
recvfrom(4, "_b?ie=UTF8&node=658390051\0\0008www."..., 271, MSG_DONTWAIT, NULL, NULL) = 206

Judging from the packet capture results, after executing two SQL query statements, the process did not send the query request packet again. From the SQL statement log, it was also found that only two were executed:

select * from sites where type = 1 limit 50;
select * from sites where type = 2 limit 50;

But from these phenomena, there is still no clue, so I have to use the ultimate method: output debugging! I roughly looked at the code and added output statements in key places, so the code looks like this:

echo("start foreach\n");
foreach($types as $type)
{
 echo("foreach $type\n");
 $result[$type] = $this->getSites($type);
}
echo("end foreach\n"); 

After execution, the output is as follows. When querying the URL with type 2, it is stuck:

start foreach
foreach 1
foreach 2

I began to suspect that there was a problem with the call getSites() method, the code is as follows:

$sites = array(); // Omit the code for querying from the database$siteNum = 8;  // Omit the code read from the configuration$urlKeys = $result = array();
for($i = 0; $i < $siteNum; $i++)
{
 do {
  $site = array_shift($sites);
  $urlKey = md5($site['url']);
 } while(array_key_exists($urlKey, $urlKeys));

 $urlKeys[$urlKey] = 1;
 $result[] = $site;
}
return $result;

It turns out that in order to achieve the realization, I wrote 2 loops here with 8 non-repetitive URLs. If there are only 7 non-repetitive URLs in the result, one will be empty, and less than 7 will have a vicious loop! So I checked the number of URLs with type 2, and it turned out that there were only 6!

Summarize

It took about 1 day from discovery to resolution. Although it was finally proved that it was caused by a low-level code bug, the entire investigation process was quite rewarding. The initial take on it was very superficial. The results of tcpdump and strace in the process were already very telling. You should be more proficient in the application of each tool, and the results of the tool should also be analyzed in depth. The above is the entire content of this article. I hope the content of this article will be of some help to your study or work. If you have any questions, you can leave a message to communicate.