our nginx server is stopping/crashing on its own

Question

our nginx server crashes on its own and it did like this for couple of times randomly. I can't figure out why its happening. This is what it shows when I check nginx status.

● nginx.service - A high performance web server and a reverse proxy server
   Loaded: loaded (/lib/systemd/system/nginx.service; enabled; vendor preset: enabled)
   Active: failed (Result: exit-code) since Thu 2018-11-01 00:48:16 IST; 9h ago
  Process: 16654 ExecStop=/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid (code=exited, status=0/SUCCE
  Process: 16702 ExecStart=/usr/sbin/nginx -g daemon on; master_process on; (code=exited, status=1/FAILURE)
  Process: 16699 ExecStartPre=/usr/sbin/nginx -t -q -g daemon on; master_process on; (code=exited, status=0/SUCCESS)
 Main PID: 1353 (code=exited, status=0/SUCCESS)

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

and nginx error.log

2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:80 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to [::]:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: bind() to 0.0.0.0:443 failed (98: Address already in use)
2018/11/01 00:48:13 [emerg] 16702#16702: still could not bind()
2018/11/01 00:48:16 [alert] 16665#16665: unlink() "/run/nginx.pid" failed (2: No such file or directory)

EDIT

sudo netstat -nlp
=====================================================================================================
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:27017         0.0.0.0:*               LISTEN      1036/mongod     
tcp        0      0 127.0.0.1:3306          0.0.0.0:*               LISTEN      30941/mysqld    
tcp        0      0 127.0.0.1:587           0.0.0.0:*               LISTEN      1382/sendmail: MTA:
tcp        0      0 127.0.0.1:6379          0.0.0.0:*               LISTEN      1147/redis-server 1
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      29550/nginx -g daem
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      1153/sshd       
tcp        0      0 127.0.0.1:3000          0.0.0.0:*               LISTEN      6988/node       
tcp        0      0 127.0.0.1:8088          0.0.0.0:*               LISTEN      1042/influxd    
tcp        0      0 127.0.0.1:25            0.0.0.0:*               LISTEN      1382/sendmail: MTA:
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      29550/nginx -g daem
tcp6       0      0 127.0.0.1:7983          :::*                    LISTEN      1691/java       
tcp6       0      0 :::80                   :::*                    LISTEN      29550/nginx -g daem
tcp6       0      0 :::8086                 :::*                    LISTEN      1042/influxd    
tcp6       0      0 :::22                   :::*                    LISTEN      1153/sshd       
tcp6       0      0 :::8983                 :::*                    LISTEN      1691/java       
tcp6       0      0 :::8888                 :::*                    LISTEN      1117/chronograf 
tcp6       0      0 :::443                  :::*                    LISTEN      29550/nginx -g daem
udp        0      0 0.0.0.0:68              0.0.0.0:*                           825/dhclient    
Active UNIX domain sockets (only servers)
Proto RefCnt Flags       Type       State         I-Node   PID/Program name    Path
unix  2      [ ACC ]     STREAM     LISTENING     8775     1/init              /run/systemd/private
unix  2      [ ACC ]     STREAM     LISTENING     16995534 25536/systemd       /run/user/1000/systemd/private
unix  2      [ ACC ]     STREAM     LISTENING     17834    1326/systemd        /run/user/120/systemd/private
unix  2      [ ACC ]     SEQPACKET  LISTENING     8779     1/init              /run/udev/control
unix  2      [ ACC ]     STREAM     LISTENING     8790     1/init              /run/systemd/journal/stdout
unix  2      [ ACC ]     STREAM     LISTENING     8793     1/init              /run/lvm/lvmpolld.socket
unix  2      [ ACC ]     STREAM     LISTENING     8794     1/init              /run/lvm/lvmetad.socket
unix  2      [ ACC ]     STREAM     LISTENING     13115    1/init              /var/lib/lxd/unix.socket
unix  2      [ ACC ]     STREAM     LISTENING     17902    1036/mongod         /tmp/mongodb-27017.sock
unix  2      [ ACC ]     STREAM     LISTENING     24330    1053/node           /home/ubuntu/.pm2/pub.sock
unix  2      [ ACC ]     STREAM     LISTENING     17115993 6389/git-credential /home/ubuntu/.git-credential-cache/socket
unix  2      [ ACC ]     STREAM     LISTENING     24408    1053/node           /home/ubuntu/.pm2/rpc.sock
unix  2      [ ACC ]     STREAM     LISTENING     13112    1/init              /run/snapd.socket
unix  2      [ ACC ]     STREAM     LISTENING     13113    1/init              /run/snapd-snap.socket
unix  2      [ ACC ]     STREAM     LISTENING     13114    1/init              /run/acpid.socket
unix  2      [ ACC ]     STREAM     LISTENING     13118    1/init              /var/run/dbus/system_bus_socket
unix  2      [ ACC ]     STREAM     LISTENING     23857    1114/python         /var/run/supervisor.sock.1114
unix  2      [ ACC ]     STREAM     LISTENING     17939    1382/sendmail: MTA: /var/run/sendmail/mta/smcontrol
unix  2      [ ACC ]     STREAM     LISTENING     13111    1/init              /run/uuidd/request
unix  2      [ ACC ]     STREAM     LISTENING     13271    1034/iscsid         @ISCSIADM_ABSTRACT_NAMESPACE
unix  2      [ ACC ]     STREAM     LISTENING     18633    1386/php-fpm.conf)  /run/php/php7.0-fpm.sock
unix  2      [ ACC ]     STREAM     LISTENING     9990652  30941/mysqld        /var/run/mysqld/mysqld.sock

EDIT 2

# Stop dance for nginx
# =======================
#
# ExecStop sends SIGSTOP (graceful stop) to the nginx process.
# If, after 5s (--retry QUIT/5) nginx is still running, systemd takes control
# and sends SIGTERM (fast shutdown) to the main process.
# After another 5s (TimeoutStopSec=5), and if nginx is alive, systemd sends
# SIGKILL to all the remaining processes in the process group (KillMode=mixed).
#
# nginx signals reference doc:
# http://nginx.org/en/docs/control.html
#
[Unit]
Description=A high performance web server and a reverse proxy server
After=network.target

[Service]
Type=forking
PIDFile=/run/nginx.pid
ExecStartPre=/usr/sbin/nginx -t -q -g 'daemon on; master_process on;'
ExecStart=/usr/sbin/nginx -g 'daemon on; master_process on;'
ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reload
ExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pid
TimeoutStopSec=5
KillMode=mixed

[Install]
WantedBy=multi-user.target

And is there any way to restart nginx automatically if anything happens like this?

It is restarted automatically by systemd. The problem is that it can't start because a different httpd is already running. Find out why. — Gerald Schneider, Nov 01 '18 at 06:59
that's a feature of systemd. If a services exits unexpectedly it is restarted a couple of times until systemd determines that it is faulty and gives up. You have two problems: a) your nginx crashes (and the part of the log you provided doesn't show why). b) nginx can't start because the ports 80 and 443 are already in use (that's what your log shows). Find out what is blocking the ports (`netstat -nlp`) and clean that up. Check your logs further on why nginx was stopped. — Gerald Schneider, Nov 01 '18 at 07:11
Actually thats all there is in the error log and I missed the first two lines of the logs, 2018/11/01 00:48:11 [notice] 16663#16663: signal process started 2018/11/01 00:48:11 [error] 16663#16663: open() "/run/nginx.pid" failed (2: No such file or directory) — Sayantan Das, Nov 01 '18 at 07:17
open() "/run/nginx.pid" failed tells us to check pid directive in nginx.conf. — minish, Nov 01 '18 at 08:07

Michael Hampton · Answer 1 · 2018-11-02T13:47:16.447

2

Your netstat output shows that nginx is already running at the time you are trying to start it. Was it started manually outside of systemd? That's usually what happens in this case. Try manually killing the nginx process, then restarting it within systemd.

killall nginx
# and wait until netstat no longer shows it, or use kill -9
systemctl start nginx

It could also be that systemd lost track of a running nginx process because whoever wrote that systemd unit doesn't know what they're doing. It actually tries to use the ancient and now obsolete start-stop-daemon to send signals to nginx, when systemd is perfectly capable of doing this itself! This is guaranteed to cause misery eventually. Try updating to a current version of nginx and/or Ubuntu, where that service might be fixed.

Or just remove the erroneous ExecStop= line and replace it with KillSignal=QUIT which is what the Red Hat nginx systemd unit does, and is the correct way to do it in systemd.

edited Nov 02 '18 at 13:47

answered Nov 01 '18 at 15:44

Michael Hampton

244,070
43
506
972

I had installed nginx following a tutorial from digitalocean community tutorial. I have updated the question with the systemd unit content. can you please review and guide me to what I should do? we are using ubuntu 16.04 and nginx 1.10 – Sayantan Das Nov 02 '18 at 13:01
1

@SayantanDas That big comment in the systemd unit makes it clear that whoever wrote it didn't even read the documentation they linked to. Nowhere in there is a STOP signal documented. It's certainly not a signal one would expect to send to kill a process. It would just stop it, much like Ctrl-Z for a process running in a terminal. If you can't upgrade Ubuntu, then follow the suggestion I gave in the answer. – Michael Hampton Nov 02 '18 at 13:46
1

The thing about SIGSTOP is that a process is still using all its resources (RAM, network ports, etc.) but is not executing on the CPU. Which means it can't do anything until it starts up again with a SIGCONT. That includes responding to other signals like SIGQUIT. If it's stopped then those signals just wait around until the process is started again. This means, among other things, that using start-stop-daemon makes absolutely no sense at all, and neither does that comment. – Michael Hampton Nov 02 '18 at 14:51
I think the unit service came by default while installing nginx because I did not write it. By the way, I found this unit file from nginx website is this okay ? https://www.nginx.com/resources/wiki/start/topics/examples/systemd/ – Sayantan Das Nov 05 '18 at 07:41
I know this is old but I can corroborate that this is the default nginx unit file that comes with Ubuntu by default, at least as of 18.04 LTS as I am having the same problem and my unit file is just as borked as OPs. Thanks @MichaelHampton for the suggestion I will give this a shot. Clearly the solution here is don't use Ubuntu Server.... /s – ccellist Dec 17 '21 at 13:33

score 0 · Answer 2 · edited Nov 01 '18 at 07:36

0

[::]:80 is a ipv6 address. This error can be caused if you have a nginx configuration that is listening on port 80 and also on port [::]:80.

I had the following in my default sites-available file:

listen 80;
listen [::]:80 default_server;

You can fix this by adding ipv6only=on to the [::]:80 like this:

listen 80;
listen [::]:80 ipv6only=on default_server;

edited Nov 01 '18 at 07:36

Gerald Schneider

23,274
8
57
89

answered Nov 01 '18 at 07:35

Udhayakumar R

1
2

alternatively just skip the ipv4 listen. the `[::]:80` parameter listens on ipv4 and ipv6, so the `listen 80` is just unnecessary. – Gerald Schneider Nov 01 '18 at 07:37
This only applies to very old versions of nginx prior to about 1.4. In current versions it is not necessary. See [here](https://serverfault.com/a/638370/126632) and [here](https://serverfault.com/a/512057/126632) for a detailed discussion of the issue. – Michael Hampton Nov 02 '18 at 13:39

our nginx server is stopping/crashing on its own

2 Answers2