1

We're using Ubuntu 16.04 on AWS (4.4.0-1066-aws x86_64) to send pushes to Android and iOS clients from PHP application using curl. This is the bit of code used to send a push to Firebase:

$ch[$i] = curl_init();
curl_setopt($ch[$i], CURLOPT_URL, LocalConfig::ANDROID_URL_REQUEST);
curl_setopt($ch[$i], CURLOPT_POST, true);
curl_setopt($ch[$i], CURLOPT_HTTPHEADER, $headers);
curl_setopt($ch[$i], CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch[$i], CURLOPT_POSTFIELDS, json_encode($fields));
curl_setopt($ch[$i], CURLOPT_CONNECTTIMEOUT, LocalConfig::CURLOPT_CONNECTTIMEOUT);
curl_setopt($ch[$i], CURLOPT_TIMEOUT, LocalConfig::CURLOPT_TIMEOUT);
curl_multi_add_handle($multiCurl, $ch[$i]);

It is working, but my colleagues are telling me that in the past curl sometimes started to return error 0 for most of the connections. Some connections still were successful.

The solution they have come up with is to shut down the push application, restart networking service, and start the push application again.

./cli/bash/stopAll.sh
sudo systemctl restart networking
./cli/bash/runAll.sh

If only the application was restarted, the issue did not disappear, which means the problem is in the networking.

Of course this means some downtime for clients, including response time, which can be minimized by automating the procedure, but ideally we want to never see the error in the first place.

More info:

  • curl is accessing these urls: "https://api.push.apple.com/3/device/" and "https://fcm.googleapis.com/fcm/send";
  • peak number of connections is 75, peak number of sockets is around 10k (mostly timewaits. Orphans are closed within a second), lower than networking limits. Bandwidth is barely used;
  • load average is below 0.25;
  • 5GB of disk and 2 Gb of RAM is free. Maybe it's too small, but the server is used only for this purpose and nothing else;
  • uptime is in hundreds of days and the issue has been happening once every 2-3 months. We can't reproduce it at will;
  • the server has netdata and atop installed now, but no monitoring was configured before the last event, not even atop.

What is it, what else should I look up, and how can I fix it?

roundowl
  • 11
  • 2

0 Answers0