0

As the title mention it, the postfix service won't start after rebooting the system. We are using our own custom Linux OS based on OpenSuse 12.1 and recently verification departement has found that the /var repository is getting bigger and bigger due to unsend mail in the maildrop queue of postfix.

Who send those mails ? We have some applications that share some logs between systems.

I was looking over different kind of forum for my answer without any success.

After rebbot :

$systemctl status postfix.service
postfix.service - Postfix Mail Transport Agent
          Loaded: loaded (/etc/systemd/system/postfix.service; enabled)
          Active: inactive (dead)
          CGroup: name=systemd:/system/postfix.service

I found here Postfix doesn't start on reboot that the problem can be a possible conflict with sendmail. To make sure that sendmail program is well link with postfix :

$ ldd /usr/sbin/sendmail
        linux-vdso.so.1 =>  (0x00007fffa25ff000)
        libpostfix-global.so.1 => /usr/lib64/libpostfix-global.so.1 (0x00007fdb174d2000)
        libpostfix-util.so.1 => /usr/lib64/libpostfix-util.so.1 (0x00007fdb1729a000)
        libc.so.6 => /lib64/libc.so.6 (0x00007fdb16f0a000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007fdb16d06000)
        libdb-4.8.so => /usr/lib64/libdb-4.8.so (0x00007fdb1698a000)
        libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fdb16772000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fdb1770b000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fdb16555000)

I also check in /var/log/messages and there's not any information about sendmail or even postfix failure.

Let's start the service :

$ systemctl start postfix.service
$ systemctl status postfix.service
postfix.service - Postfix Mail Transport Agent
          Loaded: loaded (/etc/systemd/system/postfix.service; enabled)
          Active: active (running) since Sat, 28 Jan 2017 21:49:37 +0300; 41s ago
         Process: 3450 ExecStartPost=/etc/postfix/system/cond_slp register (code=exited, status=0/SUCCESS)
         Process: 3444 ExecStartPost=/etc/postfix/system/wait_qmgr 60 (code=exited, status=0/SUCCESS)
         Process: 3368 ExecStart=/usr/sbin/postfix start (code=exited, status=0/SUCCESS)
         Process: 3366 ExecStartPre=/etc/postfix/system/update_chroot (code=exited, status=0/SUCCESS)
         Process: 3363 ExecStartPre=/bin/echo Starting mail service (Postfix) (code=exited, status=0/SUCCESS)
        Main PID: 3443 (master)
          CGroup: name=systemd:/system/postfix.service
                  ├ 3443 /usr/lib/postfix/master
                  ├ 3445 pickup -l -t fifo -u
                  ├ 3446 qmgr -l -t fifo -u
                  ├ 3447 cleanup -z -t unix -u
                  ├ 3466 trivial-rewrite -n rewrite -t unix -u
                  ├ 3467 local -t unix
                  ├ 3468 local -t unix
                  └ 3469 local -t unix

Good. Reboot the system and the service turn to inactive(dead). Returning to /var/log I found this file mail.warn

$ cat mail.info
Jan 28 03:13:55 msx postfix/postfix-script[2527]: warning: not owned by group maildrop: /usr/sbin/postqueue
Jan 28 03:13:55 msx postfix/postfix-script[2528]: warning: not owned by group maildrop: /usr/sbin/postdrop
Jan 28 03:13:55 msx postfix/postfix-script[2530]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postqueue
Jan 28 03:13:55 msx postfix/postfix-script[2531]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postdrop
Jan 28 21:49:37 msx postfix/postfix-script[3430]: warning: not owned by group maildrop: /usr/sbin/postqueue
Jan 28 21:49:37 msx postfix/postfix-script[3431]: warning: not owned by group maildrop: /usr/sbin/postdrop
Jan 28 21:49:37 msx postfix/postfix-script[3434]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postqueue
Jan 28 21:49:37 msx postfix/postfix-script[3435]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postdrop

I don't know if this can't help to resolve my problem.

Extra informations

Inside the postfix.service file :

$ cat /etc/systemd/system/postfix.service
[Unit]
Description=Postfix Mail Transport Agent
Requires=var-run.mount nss-lookup.target network.target remote-fs.target syslog.target time-sync.target
After=var-run.mount nss-lookup.target network.target remote-fs.target syslog.target time-sync.target
After=amavis.service mysql.service cyrus.service ldap.service openslp.service ypbind.service
Before=mail-transfer-agent.target
Conflicts=sendmail.service exim.service

[Service]
Type=forking
PIDFile=/var/spool/postfix/pid/master.pid
ExecStartPre=-/bin/echo 'Starting mail service (Postfix)'
EnvironmentFile=-/etc/sysconfig/postfix
ExecStartPre=/etc/postfix/system/update_chroot
ExecStart=/usr/sbin/postfix start
ExecStartPost=/etc/postfix/system/wait_qmgr 60
ExecStartPost=/etc/postfix/system/cond_slp register
ExecReload=/usr/sbin/postfix reload
ExecReload=/usr/sbin/postfix flush
ExecStop=/usr/sbin/postfix stop
ExecStopPost=/etc/postfix/system/cond_slp deregister

[Install]
WantedBy=multi-user.target

List all services after rebooting without starting the postfix service.

$systemctl list-unit-files --type=service
...
klog.service              disabled
klogd.service             masked
ldconfig.service          masked
loadmodules.service       masked
local.service             static
localfs.service           static
openhpid.service          enabled
postfix.service           enabled
postgresql.service        static
poweroff.service          static
proc.service              masked
...

UPDATE

After set LogLevel=debug in /etc/systemd/system.conf, I was able to have more data related to this issue. In /var/log/messages/ I found this :

Jan 31 19:17:00 msx kernel:   10.111126] systemd[1]: -.mount changed dead -> mounted
Jan 31 19:17:00 msx kernel:   10.111147] systemd[1]: Activating default unit: default.target
Jan 31 19:17:00 msx kernel:   10.111153] systemd[1]: Trying to enqueue job multi-user.target/start/replace
Jan 31 19:17:00 msx kernel:   10.111204] systemd[1]: Cannot add dependency job for unit hpiwdt.service, ignoring: Unit hpiwdt.service failed to load: No such file or directory. See system logs and 'systemctl status hpiwdt.service' for details.
Jan 31 19:17:00 msx kernel:   10.111276] systemd[1]: Found ordering cycle on lwresd.service/start
Jan 31 19:17:00 msx kernel:   10.111279] systemd[1]: Walked on cycle path to nss-lookup.target/start
Jan 31 19:17:00 msx kernel:   10.111281] systemd[1]: Walked on cycle path to lwresd.service/start
Jan 31 19:17:00 msx kernel:   10.111284] systemd[1]: Breaking ordering cycle by deleting job nss-lookup.target/start
Jan 31 19:17:00 msx kernel:   10.111286] systemd[1]: Deleting job postfix.service/start as dependency of job nss-lookup.target/start

I don't even know what is or does nss-lookup. If anyone have any idea. Thank you.

SAN ALexis
  • 1
  • 1
  • 2
  • Can you post the output of `journalctl -u postfix` ? Probably there you will have some messages that point you to the root cause. – Pablo Martinez Mar 10 '17 at 19:03
  • Hi @Pablo Martinez, since we are running on custom Os based on Opensuse 12.1 `journalctl` was not introduced yet in `systemd`. On `/var/log/messages` there's nothing related with `sendmail` or even `postfix`. I will add `LogLevel=debug` on `/etc/systemd/system.conf` configuration file. I will update this post if I got something from that. Thank for your time. – SAN ALexis Mar 10 '17 at 19:54
  • cross post: http://unix.stackexchange.com/q/350633/88119 – sebix Mar 12 '17 at 14:48

6 Answers6

1

All else aside, openSUSE 12.1 is a little old (released 2013), and no longer receives updates, so I would consider looking into using a more recent OS.

Your log shows some permission issues (postfix has a few accounts/groups it uses for unprivileged actions, so it can get a bit messy):

Jan 28 03:13:55 msx postfix/postfix-script[2527]: warning: not owned by group maildrop: /usr/sbin/postqueue
Jan 28 03:13:55 msx postfix/postfix-script[2528]: warning: not owned by group maildrop: /usr/sbin/postdrop
Jan 28 03:13:55 msx postfix/postfix-script[2530]: warning: not set-gid or not owner+group+world executable: /usr/sbin/postqueue

On a more recent SUSE system, the permissions for postfix are given as:

> sudo cat /etc/permissions.d/postfix
/usr/sbin/sendmail              root:root       0755
/etc/postfix/sasl_passwd        root:root       0600
/etc/postfix/sasl_passwd.db     root:root       0600
/usr/sbin/postqueue             root:maildrop   2755
/usr/sbin/postdrop              root:maildrop   2755

Applying those permissions should resolve the two basic issues seen in your logs, postdrop and postqueue having bad ownership and mode.

You could run sudo chkstat --warn --system to check what permissions might need fixing up, and if all looks good, let it do its thing by running sudo chkstat --system --set

iwaseatenbyagrue
  • 3,688
  • 15
  • 24
  • Thank you @iwaseatenbyagrue for your answer. Since we have a custom OS, someting it takes longer to release it because for each new OS we need to port our changes and test them. We are thinking to move to CentOS for it long "end of life" support. postqueue and postdrop permissions was set to 0755 I change them to 2755. Still have the same problem. – SAN ALexis Mar 10 '17 at 17:14
0

Experienced same issue after upgrading to Cent OS 7.9

issue was due to sendmail

systemctl list-unit-files --type=service |grep -i  -e postfix -e sendmail

postfix.service                               enabled

sendmail.service                              enabled

issued the following command to disable sendmail

systemctl disable sendmail.service

systemctl list-unit-files --type=service |grep -i  -e postfix -e endmail

postfix.service                               enabled

sendmail.service                              disabled

Now Postfix is starting with out any issue after each reboot.

Michael Hampton
  • 244,070
  • 43
  • 506
  • 972
0

The way I would approach this problem is would first check who is opening port 25, if it is open at all. Naturally, only one of them can use port 25 at a time. Can you try ss- 4tln -o state LISTENINING '( sport = :25 )'

oogway
  • 64
  • 2
0

"I don't even know what is or does nss-lookup. If anyone have any idea. Thank you."

It's the nameserver resolution service sort of critical to postfix for routing and any DNS based activity for that matter. I would look at your network startup scripts as there is something wierd there. lwresd (lightweight resolution demon) unloading nss due to it failing to load should not happen.

refer: https://linux.die.net/man/8/lwresd, https://en.wikipedia.org/wiki/Name_Service_Switch and even better a probable cause and solution here: https://forums.opensuse.org/showthread.php/524577-systemd-nss-lookup-target-dependency-errors-on-all-machines-boot

nss is critical resolving names to ipaddresses so postfix basically won't work without it nor any other network services relying on name resolution.

nss is obviously loading and running as removing it's dependency not only allows postfix to start but I assume also to run.

It could just be a cycle/timing issue i.e. lwresd is trying to start or timing out before nss is up and has nss flagged as a dependency. I would be looking at that chain and resolving rather than the hack of removing the dependency from postfix as any subsequent postfix package upgrades will overwrite the service descriptor and you will be back where you started again

0

First of all, adding LogLevel=debug on /etc/systemd/system.conf provides useful log to understand what really happens on services at startup runtime. As mentioned in my updated question, there's was an order conflit with nss-lookup.target. By removing nss-lookup.target directly in /etc/systemd/system/postfix.service in both variables[Requires] and [After] the system was able to start postfix on boot process.

Hope this can help.

Thank you

SAN ALexis
  • 1
  • 1
  • 2
0

You might have a conflict between Sendmail and Postfix. Make sure your Sendmail is not running after reboot if you plan to use Postfix as your primary email application. Both of the application has been configured to avoid conflict by shutting down any conflicting application when it is coming up doing a reboot.