DanTup
Posts: 28
Joined: Sun Mar 27, 2016 7:29 pm

DHCP race causing postfix failures?

Sun Mar 27, 2016 7:45 pm

I've spent much of the past few days trying to figure out why Postfix doesn't work correctly on my Pi on a clean install of Raspian. The problem is that when Postfix starts up and creates its chroot, it copies a bad (empty) /etc/resolv.conf.

After much debugging, I've discovered that there are 7 seconds between /etc/resolv.conf being cleared and re-written at boot. Postfix is starting up in between!

Here's what seems to be happening at boot:
  1. Contents of /etc/resolv.conf are correct from previous boot
  2. "/sbin/resolvconf -d eth0 -f" gets executed, clearing out /etc/resolv.conf
  3. Postfix starts, and copies /etc/resolv.conf into its chroot
  4. "/sbin/resolvconf2 -a eth0" gets executed, adding the nameserver to /etc/resolve.conf
Now; I don't know why there are seven seconds between (2) and (4) (my DHCP is done by a Sky Hub router and it's plugged in with a cable! probably it sucks! :)).

Postfix's init.d script has $network and $named in its dependencies, I don't know whether this considers DHCP or not?

# Required-Start: $local_fs $remote_fs $syslog $named $network $time

So; I have two questions really:
  1. What is not behaving correctly here? I can't imagine that this sort of race condition is normal, so I presume something isn't working correctly which is allowing the services to start up before they should; or the if-up.d scripts are running too early (before DHCP is finished)?
  2. What is the best way to fix it? I'm trying to make as few changes to the system as possible (and I'm scripting it all, so I can rebuild the Pi easily). I guess putting a fixed IP in would fix it, though I'd rather stick with the routers DHCP reservations if possible.

DanTup
Posts: 28
Joined: Sun Mar 27, 2016 7:29 pm

Re: DHCP race causing postfix failures?

Mon Mar 28, 2016 7:56 am

Ok; after many wasted hours I discovered the raspi-config setting that lets you change between "Slow boot - wait for network" and "Fast boot - don't wait for network". I've fixed it by copying the code that setting "Slow boot" executes:

Code: Select all

mkdir -p /etc/systemd/system/dhcpcd.service.d/
cat > /etc/systemd/system/dhcpcd.service.d/wait.conf << EOF 
[Service]
ExecStart=
ExecStart=/sbin/dhcpcd -q -w
EOF
I don't think this is a very good default, so I posted my feedback on GitHub. I don't expect it to change, but think it's worth having somewhere for others to add comments if it bites them too.

DanTup
Posts: 28
Joined: Sun Mar 27, 2016 7:29 pm

Re: DHCP race causing postfix failures?

Thu Apr 28, 2016 7:23 pm

It turns out this hasn't fixed the problem at all; my logs are full of postfix failures today, with '/var/spool/postfix/etc/resolve.conf' not containing the DNS server again.

Not sure if this is isolated to Raspbian, but it's pretty frustrating :(

Code: Select all

danny@raspberrypi:~$ cat /etc/systemd/system/dhcpcd.service.d/wait.conf
[Service]
ExecStart=
ExecStart=/sbin/dhcpcd -q -w

danny@raspberrypi:~$ cat /var/spool/postfix/etc/resolv.conf 
# Generated by resolvconf

danny@raspberrypi:~$ cat /var/log/mail.info
Apr 28 18:17:20 raspberrypi postfix/smtp[5597]: A7BDE209B5: to=<danny+pi@xxxxxxxx>, orig_to=<root>, relay=none, delay=21537, delays=21537/0.05/0/0, dsn=4.4.3, status=deferred (Host or domain name not found. Name service error for name=aspmx.l.google.com type=MX: Host not found, try again)

mutley
Posts: 61
Joined: Sat Jan 02, 2016 8:06 pm

Re: DHCP race causing postfix failures?

Fri Apr 29, 2016 1:48 pm

Debain Jessie init script load order is all messed up on the pi on a vanilla install. It comes from 1/2 being the old style script (RC) and 1/2 being the newer style, usually see some form of dependency order error in the system logs. Some script you get with apt-get that have dependency will be ordered incorrectly. (rcbind is one that many complain about), I would suggest you look at trying to reorder them yourself, or simply put a wait/pause in the postfix script.
This explains the issue for NFS
viewtopic.php?f=28&t=125483

The way I did it was leave all the system scripts as is, and simply write my own init script, forced it to be the last one loaded, then start/restart any services I know would fail on boot.

Change the below references from rcbind to postfix, and you probably want "/etc/init.d/postfix restart", rather than "/etc/init.d/rpcbind start", as rcbind fails to actually start and postfix starts, but just starts incorrectly.

Code: Select all

#!/bin/sh
### BEGIN INIT INFO
# Provides:          fix_rpcbind
# Required-Start:    $all
# Required-Stop:     $all
# Default-Start:     S
# Default-Stop:      0 1 6
# Short-Description: fix rpcbind deamon on raspberry pi jessie
# Description:       fix  rpcbind deamon on raspberry pi jessie
### END INIT INFO

test -x /etc/init.d/rpcbind || exit 5

# Using the lsb functions to perform the operations.
. /lib/lsb/init-functions

case $1 in
 start)
  /etc/init.d/rpcbind start
  ;;
 stop)
  # Stop the daemon.
  ;;
 *)
  # For invalid arguments, print the usage message.
  echo "Usage: $0 {start}"
  exit 2
  ;;
esac

DanTup
Posts: 28
Joined: Sun Mar 27, 2016 7:29 pm

Re: DHCP race causing postfix failures?

Fri Apr 29, 2016 3:56 pm

Thanks for the info. I was trying to avoid a "custom" solution, but looks like it be the only way :(

Is this a Raspbian issue or does it affect Debian too? Is it something fixed in Stretch or is it likely to be an ongoing issue?

Return to “General discussion”