Best practices to troubleshoot stuck php application due to internal curl calls to unresponsive endpoints

Question

Recently I found myself with a website (Prestashop e-commerce on a Centos PHP-FPM /Apache / MySql machine ) that was down and not responding to web requests.

After investigation, issue was due to an API call made with php-curl towards an endpoint that was temporarily offline, inside an application PHP file that was recalled in all pages of the website.

The cURL call had been wrongly made without a CURLOPT_TIMEOUT_MS settings, so users visiting my website filled rapidly the maximum number of php connections, blocking the php-fpm processes and preventing my server to receive other incoming connections.

I wonder if one can quickly and effectively prevent / identify such a problem "in production" from the terminal if it happens again (especially to quickly understand which is the blocked endpoint or identify the file from which the script that blocked the server is generated), since in my case I had to check the issue at "application level" rather than from server since :

launching "top" the server shows the list of blocked php-fpm processes without any additional information to understand the problem (also server load average was about 0.00 since there's was almost no activity due to stuck connections).
Launching "netstat -nputw" show me a lot of internal connections in TIME_WAIT status, but again no information about the outage "culprit" (could I see the endpoint called up by php-curl with netstat or a similar network command ?)
Launching a "strace" of the php-fpm processes I see a lot of involved files, but this is not very helpful since the site, with average traffic, opens dozens and dozens of files.
The webserver logs only informed me of timeout connections to web resources, but not of the script containing the problematic cURL call.

Thanks for your help.

the question is not what to do after having understood the root of the problem, but how to properly detect it in case it happens again (e.g. on another website) — gennaris, Nov 27 '21 at 08:46
If the results of the curl are required for building the page, then you must wait until the curl fails or times out. What aspect of this statement can you relax? — Rick James, Nov 27 '21 at 15:59
maybe I have not been clear enough in formulating the question, what I need to know is, from the "sysadmin" point of view, how to find from terminal, in the quickest possible time, the root cause in a situation like this if it were to happen again for example on another server, without being aware of how the application is made and without analyzing the application. — gennaris, Nov 28 '21 at 16:56
And my suggestion was one step toward that. I may have further clues after you answer my questions. (When I can't answer a question, I at least try to help with the debugging.) — Rick James, Nov 28 '21 at 17:00
I will try to explain myself better: once I found the problem on the application, I know perfectly well that in order to solve it a timeout has to be set on the unresponsive curl call (or the curl call has to be disabled altogether) but fixing the poor written application was not part of my job... The question was asked because my need is as a sysadmin - to identify the root cause of problem without knowing nothing about the underlying application in the quickest possibile time with a shell in front of me. — gennaris, Nov 29 '21 at 08:00
An anecdote: Many years ago, I had a program that was doing a lot of curls. Once in a while, it would hang. After researching it quite a lot, and asking experts, I came to the conclusion that something very low in the OS was causing the problem. I could repeatedly show that the hang was exactly 80.0 seconds. This, of course, was unacceptable. But I could not find a workaround within the thread. (Possibly using multiple threads would have let me continue processing, but I did not want to go there.) — Rick James, Nov 29 '21 at 17:28

Best practices to troubleshoot stuck php application due to internal curl calls to unresponsive endpoints

0 Answers0