0

I rebooted my Ubuntu server this morning because I was having what appeared to be a low-memory error (happens occasionally, hasn't been enough of a problem to try and fix it). But now, my site (which was previously working fine) is no longer accessible from the browser.

The setup: I'm running a NuxtJS site using pm2 to daemonize it, and nginx as a reverse proxy. I have a post-receive git hook so that I can push to my remote git repo, which then rebuilds the app and restarts the pm2 instance.

I can only access my site from inside the server, inside a terminal window. Lynx, wget, and cURL all work, and even follow the 301 redirect to HTTPS. And they're working when I request the domain itself, not just the localhost:3000 that's getting reverse proxied. As in, curl https://my-domain.org works. If I try to curl/lynx/etc from any other terminal window, it just waits until it times out. Same with the browser – waits until it times out.

Here are the things I've tried/looked at:

  • I'm using UFW, so I checked to see if the firewall was the problem. But 80, 443, and 8080 are all set to ALLOW.
  • I tried seeing if maybe nginx wasn't listening somehow, so I tried sudo lsof -i -P -n | grep LISTEN. Here's the output of that:
nginx     2896     root    6u  IPv4 668673557      0t0  TCP *:443 (LISTEN)
nginx     2896     root    7u  IPv4 668673558      0t0  TCP *:80 (LISTEN)
nginx     2897 www-data    6u  IPv4 668673557      0t0  TCP *:443 (LISTEN)
nginx     2897 www-data    7u  IPv4 668673558      0t0  TCP *:80 (LISTEN)
nginx     2898 www-data    6u  IPv4 668673557      0t0  TCP *:443 (LISTEN)
nginx     2898 www-data    7u  IPv4 668673558      0t0  TCP *:80 (LISTEN)
  • I tried checking nginx's access.log. All my curl/wget/Lynx requests are showing up as normal, but none of the browser requests are appearing. I also took a look at the error.log, and got this:
2021/07/31 11:51:52 [emerg] 885#885: bind() to 0.0.0.0:443 failed (98: Address already in use)
2021/07/31 11:51:52 [emerg] 885#885: bind() to 0.0.0.0:80 failed (98: Address already in use)
2021/07/31 11:51:52 [emerg] 885#885: bind() to 0.0.0.0:443 failed (98: Address already in use)
2021/07/31 11:51:52 [emerg] 885#885: bind() to 0.0.0.0:80 failed (98: Address already in use)
2021/07/31 11:51:52 [emerg] 885#885: still could not bind()

Thus far, I haven't found any solutions. I'm just baffled, because whatever changed, it changed because of a reboot. Any ideas are much appreciated.

EDIT to add some output:

sudo systemctl status nginx:

● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2021-07-31 15:05:53 EDT; 27min ago
  Process: 6834 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid (code=exited, status
  Process: 6840 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
  Process: 6837 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 6841 (nginx)
   CGroup: /system.slice/nginx.service
           ├─6841 nginx: master process /usr/sbin/nginx -g daemon on; master_process on
           ├─6842 nginx: worker process                           
           └─6843 nginx: worker process                           

Jul 31 15:05:53 parrot systemd[1]: Starting A high performance web server and a reverse proxy server...
Jul 31 15:05:53 parrot systemd[1]: Started A high performance web server and a reverse proxy server.

Output of sudo nginx -T is long, so I made it a gist.

Tinstar
  • 1
  • 3
  • @MichaelHampton Alright – I made it a gist. The link is above, but [here it is for reference](https://gist.github.com/thely/32ae2f5d6c284277874204500ec54026). – Tinstar Jul 31 '21 at 20:00
  • 1
    Hm, the nginx config looks OK (except you obfuscated a bunch of stuff unnecessarily, and also obfuscated the domain name, which is probably not helpful). I'm concerned the PIDs from systemd starting nginx do not match the PIDs you see from `ps`. Did you do something else in between these events? – Michael Hampton Jul 31 '21 at 20:34
  • @MichaelHampton The only thing I removed was the ssl_ciphers (for potential security reasons, though I'm new at this). Currently, main-site.org is the only one I have up and running on pm2, because it's the only one of the four sites I care about, but I left the rest there for completeness. I checked `ps -e` and found that nginx was using 6841, 6842, and 6843, just like in the systemctl status. – Tinstar Jul 31 '21 at 20:48
  • Eh? Your ssl ciphers are public information; every https connection to your web site is sent them. Anyway, something is clearly out of sync; I'd just restart nginx again. – Michael Hampton Jul 31 '21 at 20:52
  • Well, my bad, then. I only removed them from the gist I made, not from the actual .conf files. – Tinstar Jul 31 '21 at 20:55
  • I've tried restarting nginx several times, and just now decided to try rebooting the server again to see if that would fix it. Still the same situation. :/ – Tinstar Jul 31 '21 at 20:59
  • Check the nginx error log again to see if anything else has shown up. – Michael Hampton Jul 31 '21 at 21:04
  • how's about a killall -9 nginx and restart the application from systemctl? for testing – djdomi Jul 31 '21 at 21:07
  • It was `ufw` – see below. I don't know why it was `ufw`, but I guess it was. – Tinstar Jul 31 '21 at 21:19
  • Have you tried to disable `ufw` and see whats happen? Could you access your website when firewall disabled? – MasEDI Jul 31 '21 at 23:03
  • @MasEDI that's actually exactly what I did! iptables-persistent blocked all ports on reboot, and disabling ufw was what finally led me to the right answer. I left a link to a more specific SO answer in my answer. – Tinstar Aug 03 '21 at 04:36

1 Answers1

0

This is so stupid that I don't know why it was a problem, so any thoughts on this are appreciated. My ufw settings were/are as follows:

Status: active

To                         Action      From
--                         ------      ----
22                         ALLOW       Anywhere                  
80/tcp                     ALLOW       Anywhere                  
443/tcp                    ALLOW       Anywhere                  
80                         ALLOW       Anywhere                  
8080                       ALLOW       Anywhere                  
22 (v6)                    ALLOW       Anywhere (v6)             
80/tcp (v6)                ALLOW       Anywhere (v6)             
443/tcp (v6)               ALLOW       Anywhere (v6)             
80 (v6)                    ALLOW       Anywhere (v6)             
8080 (v6)                  ALLOW       Anywhere (v6) 

Some redundant 80's in there, but I was adding extra stuff to see if it helped.

Someone recommended I try disabling ufw, just to make sure it wasn't the problem. Apparently, it was. I disabled it, the site immediately started working, and when I re-enabled it, expecting it go back to being broken again, it... still works. So something about ufw needed to be retriggered when I rebooted the server.

EDIT: This may be because of iptables-persistent, which is autoinstalled on most servers I guess? Looks like it's the same issue as this SO answer.

Tinstar
  • 1
  • 3