4

I have a rails app running on Nginx with Puma and like clockwork, every couple of days the app goes down with a 502 Bad Gateway error.

My nginx log contains lots of errors like this:

2015/07/23 14:43:49 [error] 14044#0: *7036 connect() to unix:///var/www/myapp/myapp_app.sock failed (111: Connection refused) while connecting to upstream, client: 12.123.12.12, server: myapp.com, request: "GET /arrangements HTTP/1.1", upstream: "http://unix:///var/www/myapp/myapp_app.sock:/arrangements", host: "myapp.com", referrer: "http://myapp.com/arrangements"

I have to restart Puma and everything works again...for a couple days.

Any ideas how I can troubleshoot this? I'm newer to nginx and puma.

/etc/nginx/sites-enabled/myapp.com

upstream myapp {
                server unix:///var/www/myapp/myapp_app.sock;
        }
        server {
                listen 80;
                server_name myapp.com;
                root /var/www/myapp/current/public;
                client_max_body_size 20M;

                location ~ \.php$ {
                        try_files $uri =404;
                        fastcgi_pass unix:/var/run/php5-fpm.sock;
                        fastcgi_index index.php;
                        fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
                        include fastcgi_params;
                        allow all;
                        satisfy any;
                }

                location / {
                        proxy_pass http://myapp; # match the name of upstream directive which is defined above
                        proxy_set_header Host $host;
                        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                }
                location ~* ^/assets/ {
                        # Per RFC2616 - 1 year maximum expiry
                        expires 1y;
                        add_header Cache-Control public;

                        # Some browsers still send conditional-GET requests if there's a
                        # Last-Modified header or an ETag header even if they haven't
                        # reached the expiry date sent in the Expires header.
                        add_header Last-Modified "";
                        add_header ETag "";
                        break;
                }
        }
Catfish
  • 18,876
  • 54
  • 209
  • 353
  • 1
    Is your app running on Digital Ocean? – Elvn Aug 28 '15 at 13:33
  • Have you been receiving the e-mails from DO? If not, I'll post one. – Elvn Aug 28 '15 at 14:06
  • I've gotten a few about upgrades to some of the NYC servers. That's all i recall. – Catfish Aug 28 '15 at 14:08
  • It's worse than that. I'll post the text in the answer, because the formatting is completely garbled when I tried to post the text to the comments here. – Elvn Aug 28 '15 at 14:09
  • As connect() returned ECONNREFUSED, it suggests that the socket in question isn't open for some reason, and that's why nginx returns 502. That is, something happened with your backend, Puma. You should look into your backend logs to find out what happens with it. – Maxim Dounin Sep 03 '15 at 11:38

2 Answers2

4

The DigitalOcean network team has identified an issue with firmware running on a number of network switches within NYC3. This issue is causing intermittent loss of connectivity to customer droplets.

While the issue has been confirmed only in a subset of racks, we will be upgrading all switches running the affected firmware in NYC3. This maintenance will result in approximately ten minutes of downtime per rack at some point within the maintenance window as individual switches are upgraded.

Maintenance window: 2015-08-27 22:00 EDT - 2015-08-28 02:00 EDT 2015-08-28 02:00 UTC - 2015-08-28 06:00 UTC

We apologize for the inconvenience and appreciate your patience as we work to improve the reliability of our network.


I would give it a day or two and see if the problem you're having recurs, or simply disappears on its own.

Added/Edited

P.S. I just noticed a detail on the email,

Affected Droplets: railsbox00

if you're getting the e-mails, then your droplet is affected by the firmware problem. Check your emails and see if they list your VPS; it's at the bottom of the email.

Elvn
  • 3,021
  • 1
  • 14
  • 27
  • Added PS; email lists affected droplets. – Elvn Aug 28 '15 at 14:47
  • It still happens. It's actually been happening for like a year and I always have to restart puma. I'm ready to get off puma b/c of this issue. – Catfish Sep 08 '15 at 16:30
  • That's too bad. There was a chance the firmware bug was the culprit, sorry it's a real bug. I run Unicorn, not Puma on my DO rails server, so I don't have any Puma-specific insight. I haven't seen any weirdness with unicorn, but I know there are some reasons people choose not to use it. It's easy enough to install and set up unicorn and just give it a try. – Elvn Sep 08 '15 at 21:14
  • Reading up about Puma I found that it is now the "recommended webserver" for Ruby on Rails. Hmmmm. – Elvn Sep 09 '15 at 16:10
  • I know but i've been struggling for a year with this issue. Can't seem to figure it out. – Catfish Sep 09 '15 at 21:55
  • Is there any sort of a consistent pattern to the failure, timewise? 3 days, 4 days? Memory leaks can act like this, or maybe a cache is growing and finally explodes. I find this interesting: "http://myapp.com/arrangements" -- Is this always the referrer logged when the app throws the 502 error? – Elvn Sep 09 '15 at 22:23
  • That's b/c that was the page I was trying to access. The only thing I've noticed over the last year is that if there is an exception thrown, that's when db connections seem to not be released. If i up the db pool size, it just takes longer for all the connections to get locked up. The only inckling hunch I have of what could be causing it is i'm using rails exception_app for custom error pages. – Catfish Sep 10 '15 at 16:33
2

I don't know if this question is still relevant, but what helped me greatly with this exact problem was to move the actual location of the puma.sock file to another directory. I picked the /tmp directory.

The socket used to be on a drive that was NFS mounted to another server, and I believe that that was the problem - some hiccups in the network here and there. I'm not sure what it was exactly but since I moved the puma.sock to /tmp all problems disappeared. For me.

Jan Paul
  • 344
  • 2
  • 4
  • 1
    For me it was Puma itself. I worked with the Puma guys and ultimately upgrading to puma 2.15.x or greater fixed the problem. – Catfish Apr 11 '16 at 14:44