2

Instana trace of the problem: Instana trace showing the problem

The problem: randomly, some requests take an extra 20 to 60s to finish compared to normal.

Manually running CURL shows the following behaviour:

  • curl example.com/api
  • 2 seconds - no output nothing happens (I expect this to be PHP processing time)
  • 0.5-1 seconds - lots of JSON output and then it suddenly stops randomly, like in the middle div class="aa
  • 20+ seconds - no output
  • the last second - the remaining 1-2 lines of the request output are displayed
  • request finishes

I expect to not have the 20+ seconds delay where nothing happens.

Captured tcpdump of the communication

Infrastructure

tcpdump of the nginx <-> php-from communication

To note that the lat FCGI_STDOUT packet is delayed by 50+ seconds, and it contains a few bytes(in this case the json closing tags "]}}" ):

FastCGI (FCGI_STDOUT)
Version: 1
Type: FCGI_STDOUT (6)
Request ID: 1
Content Length: 6
Padding Length: 2
Content Data: 5c6e225d5d7d
Padding Data: 0000

FastCGI (FCGI_STDOUT)
Version: 1
Type: FCGI_STDOUT (6)
Request ID: 1
Content Length: 6
Padding Length: 2
Content Data: 5c6e225d5d7d
Padding Data: 0000

My php-fpm settings:

# php-fpm -v
PHP 7.4.28 (fpm-fcgi) (built: Mar 29 2022 03:26:54)
Copyright (c) The PHP Group
Zend Engine v3.4.0, Copyright (c) Zend Technologies
with Zend OPcache v7.4.28, Copyright (c), by Zend Technologies


[www]
user = www-data
group = www-data
listen.owner = www-data
listen.group = www-data
listen = 0.0.0.0:9000
listen.backlog = 1024
process.priority = 0
pm = dynamic
pm.min_spare_servers = 5
pm.max_spare_servers = 10
pm.process_idle_timeout = 55
pm.max_children = 100
pm.start_servers = 10
pm.max_requests = 10000
slowlog = /var/log/php-fpm/slowlog.log
request_slowlog_timeout = 10
request_terminate_timeout = 600
user = www-data
catch_workers_output = yes
security.limit_extensions = .php .phar
pm.status_path = /status
php_admin_value[memory_limit] = 512M
php_admin_value[post_max_size] = 256M
php_admin_value[upload_max_filesize] = 256M
php_admin_value[max_execution_time] = 60
php_admin_value[opcache.revalidate_freq] = 0
php_admin_value[opcache.validate_timestamps] = 0
php_admin_value[opcache.max_accelerated_files] = 32531
php_admin_value[opcache.memory_consumption] = 512M
php_admin_value[opcache.interned_strings_buffer] = 128
php_admin_value[opcache.huge_code_pages] = 0
php_admin_value[zend.assertions] = 0
php_admin_value[max_input_vars] = 2000

My nginx settings:

# nginx -v
nginx version: nginx/1.20.1


upstream php {
server 127.0.0.1:9000;
}

server_tokens off;
server_names_hash_bucket_size 256;
client_max_body_size 256m;

server {
listen 80;
server_name xxx;

root /usr/share/xxx/public_api;
index index.php;
try_files $uri $uri/ /index.php?$args;

access_log /var/log/nginx/access.log;
error_log /var/log/nginx/error.log;

client_header_timeout 120;
client_body_timeout 120;
send_timeout 120;
keepalive_timeout 60;

gzip on;
gzip_comp_level 4;
gzip_proxied any;
gzip_types application/atom+xml application/javascript application/json application/rss+xml
application/vnd.ms-fontobject application/x-font-ttf application/x-web-app-manifest+json
application/xhtml+xml application/xml font/opentype image/svg+xml image/x-icon text/css
text/plain text/x-component;

location ~ \.php$ {
fastcgi_connect_timeout 60s;
fastcgi_read_timeout 120s;
fastcgi_send_timeout 120s;

fastcgi_pass php;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
include fastcgi_params;

fastcgi_param APPLICATION_ENV live;
fastcgi_param HTTPS off;

opentracing_fastcgi_propagate_context;
}
}

Update: I managed to track down the issue to resource contention on the Instana monitoring side. Metrics were sent using TCP and hanging php-fpm under certain conditions.

  • Do you think it's `php-fpm` or `nginx` related problem, not your backend or php script? –  Apr 12 '22 at 10:16
  • I have enabled PHP slowlog and requests where PHP scripts are slow get logged (Instana confirms the log as well and the durations match). Requests where `php-fpm` is hanging (like in the tcpdump) don't. – Alexandru Keszeg Apr 12 '22 at 10:26
  • Check `php-fpm` and `nginx` log first, then go deeper into your application if nothing found. –  Apr 12 '22 at 10:33
  • Could this solve it? See the checked answer: https://serverfault.com/questions/493410/what-might-be-causing-this-intermittent-php-stalling-event – Tschallacka Apr 12 '22 at 12:07

1 Answers1

0

Without seeing your application, but looking at the symptons, I'd hazard a guess that it might be output buffer flushing automatically.

When the buffer is "full" it will output automatically to the client. Then with the now empty buffer it will gather new inputs, until the buffer is full again, which could explain the spaced network responses you get.

Try putting all your output in an array, and then at the end return a implode(PHP_EOL, $response_data);

Or use the ob_get_contents() to make your own buffer.

ob_start();
echo "Hello World!";
$contents = ob_get_contents();
ob_end_clean();

echo "The contents of the buffer are: ";
echo $contents;

This will prevent premature outputting, and might actually speed up your response as echo is criminally slow with small strings.

Tschallacka
  • 27,901
  • 14
  • 88
  • 133
  • The site is based on Zend Framework and I have verified that output buffering is on. The entire response is rendered at once. – Alexandru Keszeg Apr 12 '22 at 12:02
  • @AlexandruKeszeg, okay then, you need to profile your request (system level would be better than application one). I would recommend php extension `xdebug`. –  Apr 12 '22 at 15:47