fvisagie
Posts: 6
Joined: Wed Apr 19, 2017 11:27 am
Location: Man

Improved (=reliable) Wait for Network implementation

Thu Jun 29, 2017 11:15 pm

The Raspbian Jessie Wait for Network setting does not reliably detect network availability which complicates mounting remote drives during boot etc. Current remedies either add similarly unreliable/wasteful delays to the boot process, or do nothing useful. By using Jessie's systemd init system, this solution extends Wait for Network (i.e. it is still controlled by Wait for Network) to implement simple and reliable network availability detection.

Although this approach works well as it is, ideally it should be included in the standard distribution with raspi-config modified to control it directly through ‘systemctl en/disable’ instead of writing and deleting the current Wait for Network wait.conf.

Problem

When configured to Wait for Network, the Jessie systemd init process detects dhcpcd obtaining interface configurations before proceeding to network.target and network-online.target. At that exact point, however, interface address/es are not yet configured, and name resolution comes later still.

When not configured to Wait for Network (or when dhcpcd reaches its timeout while waiting), dhcpcd forks to a background daemon and the init process proceeds immediately. This means network.target and network-online.target are reached even without the availability of interface configurations.

Compounding the problem is the fact that - as standard - network.target and network-online.target call for no further checking. network-online.target is therefore often reached before interface configurations have been applied, and in cases even before they have been obtained.

Solution

1. Change to super-user 2. In /lib/systemd/system, create network-wait-online.service
(no special permissions needed)

Code: Select all

[email protected]:/home/pi# nano /lib/systemd/system/network-wait-online.service
Paste in the following:

Code: Select all

#
# Uses 'hostname --all-fqdns' to confirm that both: IP address[es] assigned, and DNS operational
#

[Unit]
Description=Wait for Network to be Online
Documentation=man:systemd.service(5) man:systemd.special(7)
Conflicts=shutdown.target
After=network.target
Before=network-online.target

[Service]
Type=oneshot
ExecStart= \
    /bin/bash -c ' \
    if [ -e /etc/systemd/system/dhcpcd.service.d/wait.conf ]; \
    then \
        echo Wait for Network: enabled; \
        while [ -z $(hostname --all-fqdns) ]; \
        do \
            sleep 1; \
        done; \
    else \
        echo Wait for Network: disabled; \
        exit 0; \
    fi'
TimeoutStartSec=1min 30s

[Install]
WantedBy=network-online.target

Code: Select all

Ctrl-O

Code: Select all

Ctrl-X
to write the file and exit

3. Enable network-wait-online.service

Code: Select all

[email protected]:/home/pi# systemctl enable network-wait-online.service
Created symlink from /etc/systemd/system/network-online.target.wants/network-wait-online.service to /lib/systemd/system/network-wait-online.service.
4. Reboot

Code: Select all

[email protected]:/home/pi# reboot now
5. Use Wait for Network as before to control network detection

How to reproduce the problem

1. Create a network auto-mount
On a standard system (i.e. without network-wait-online.service installed), add a line similar to the following to /etc/fstab. Dummy names can be used through-out with the exception of the host which must be a reachable hostname:

Code: Select all

//fvmedia/share /mnt/point      cifs    guest   0       0
When systemd reinitialises, this will create the following key relationships:

Code: Select all

[email protected]:/home/pi# systemctl list-dependencies --before dhcpcd
dhcpcd.service
…
● ├─network.target
…
● │ └─network-online.target
● │   ├─mnt-point.mount
…
This means that for the mount to be attempted, both the network-online and network targets must have been reached (i.e. dhcpcd must have initialised successfully). The two targets have no predecessors that can indicate network availability other than dhcpcd. Therefore network availability detection solely relies upon dhcpcd in the standard installation.

2. Induce a delay to prevent DHCP & interface initialisation normally from completing before mount
a. Configure the Pi for Wi-Fi networking
b. Ensure its dhcpcd is configured to obtain its configuration from the DHCP server (the default)
c. Configure the Wi-Fi access point to hide its SSID (“hidden network”)
d. In /etc/wpa_supplicant/wpa_supplicant.conf, disable scanning (disable or remove ‘ap_scan’ and ‘scan_ssid’ entries)

3. Standard installation with Wait for Network disabled
Ensure Wait for Network is disabled and reboot

Code: Select all

[email protected]:/home/pi# systemctl status dhcpcd
● dhcpcd.service - dhcpcd on all interfaces
   Loaded: loaded (/lib/systemd/system/dhcpcd.service; enabled)
   Active: active (running) since Thu 2017-06-29 13:21:36 BST; 1min 38s ago
  Process: 373 ExecStart=/sbin/dhcpcd -q -b (code=exited, status=0/SUCCESS)
 Main PID: 385 (dhcpcd)
   CGroup: /system.slice/dhcpcd.service
           └─385 /sbin/dhcpcd -q -b
In this configuration dhcp immediately forks to the background.

Despite dhcpcd "initialising" and the Network and Network is Online targets being reached, the mount is attempted before interface initialisation and fails:

Code: Select all

[email protected]:/home/pi# journalctl
-- Logs begin at Thu 2017-06-29 13:21:32 BST, end at Thu 2017-06-29 13:28:57 BST. --
...
Jun 29 13:21:35 fv-rpi3b dhcpcd[373]: version 6.7.1 starting
...
Jun 29 13:21:35 fv-rpi3b dhcpcd[373]: forked to background, child pid 385
...
Jun 29 13:21:36 fv-rpi3b systemd[1]: Started dhcpcd on all interfaces.
Jun 29 13:21:36 fv-rpi3b systemd[1]: Starting Network.
...
Jun 29 13:21:36 fv-rpi3b systemd[1]: Reached target Network.
...
Jun 29 13:21:36 fv-rpi3b systemd[1]: Starting Network is Online.
Jun 29 13:21:36 fv-rpi3b systemd[1]: Reached target Network is Online.
Jun 29 13:21:36 fv-rpi3b systemd[1]: Mounting /mnt/point...
...
Jun 29 13:21:36 fv-rpi3b mount[449]: mount error: could not resolve address for fvmedia: Unknown error
...
Jun 29 13:22:18 fv-rpi3b dhcpcd[385]: wlan0: carrier acquired
...
Jun 29 13:22:24 fv-rpi3b dhcpcd[385]: wlan0: leased 192.168.1.204 for 86400 seconds
...
4. Standard installation with Wait for Network enabled
Enable Wait for Network and reboot

Code: Select all

[email protected]:/home/pi# systemctl status dhcpcd
● dhcpcd.service - dhcpcd on all interfaces
   Loaded: loaded (/lib/systemd/system/dhcpcd.service; enabled)
  Drop-In: /etc/systemd/system/dhcpcd.service.d
           └─wait.conf
   Active: active (running) since Thu 2017-06-29 13:36:41 BST; 5min ago
  Process: 374 ExecStart=/sbin/dhcpcd -q -w (code=exited, status=0/SUCCESS)
 Main PID: 552 (dhcpcd)
   CGroup: /system.slice/dhcpcd.service
           └─552 /sbin/dhcpcd -q -w
In this configuration dhcpcd forks to the background when it acquires interface information or after 30s, whichever comes earlier.

The mount is again attempted before interface initialisation and fails. Wait for Network only delays dhcpcd forking to the background but does not improve network availability detection.

Code: Select all

-- Logs begin at Thu 2017-06-29 13:36:07 BST, end at Thu 2017-06-29 13:37:33 BST. --
...
Jun 29 13:36:10 fv-rpi3b dhcpcd[374]: version 6.7.1 starting
...
Jun 29 13:36:41 fv-rpi3b dhcpcd[374]: timed out
Jun 29 13:36:41 fv-rpi3b dhcpcd[374]: forked to background, child pid 552
Jun 29 13:36:41 fv-rpi3b systemd[1]: Started dhcpcd on all interfaces.
Jun 29 13:36:41 fv-rpi3b systemd[1]: Starting Network.
Jun 29 13:36:41 fv-rpi3b systemd[1]: Reached target Network.
...
Jun 29 13:36:41 fv-rpi3b systemd[1]: Starting Network is Online.
Jun 29 13:36:41 fv-rpi3b systemd[1]: Reached target Network is Online.
Jun 29 13:36:41 fv-rpi3b systemd[1]: Mounting /mnt/point...
...
Jun 29 13:36:41 fv-rpi3b mount[556]: mount error: could not resolve address for fvmedia: Unknown error
...
Jun 29 13:36:53 fv-rpi3b dhcpcd[552]: wlan0: carrier acquired
...
Jun 29 13:36:59 fv-rpi3b dhcpcd[552]: wlan0: leased 192.168.1.204 for 86400 seconds
...
5. Trace IP address availability with Wait for Network disabled
a. Configure network-wait-online.service as described in Solution above
b. Replace the ExecStart= option in network-wait-online.service with

Code: Select all

ExecStart= \
    /bin/bash -c ' \
    IPs=$(hostname --all-ip-addresses); \
    while [ -z $IPs ]; \
    do \
        echo network-wait-online: addresses=$IPs; \
        sleep 0.1; \
        IPs=$(hostname --all-ip-addresses); \
    done'
c. Disable Wait for Network and reboot

When systemd reinitialises, network-wait-online will have the following key relationships:

Code: Select all

[email protected]:/home/pi# systemctl list-dependencies --before dhcpcd
dhcpcd.service
...
● ├─network.target
...
● │ ├─network-wait-online.service
...
● │ └─network-online.target
● │   ├─mnt-point.mount
...
The modified network-wait-online service will start tracing IP address availability as soon as dhcpcd initialises (=network.target), and will satisfy network-online.target by completing successfully once an IP address has been configured.

This time the mount is only attempted after an IP address is obtained, but name resolution still fails:

Code: Select all

-- Logs begin at Thu 2017-06-29 14:20:53 BST, end at Thu 2017-06-29 14:21:51 BST. --
...
Jun 29 14:20:56 fv-rpi3b systemd[1]: Starting dhcpcd on all interfaces...
...
Jun 29 14:20:56 fv-rpi3b dhcpcd[378]: version 6.7.1 starting
...
Jun 29 14:20:56 fv-rpi3b dhcpcd[378]: forked to background, child pid 386
...
Jun 29 14:20:57 fv-rpi3b systemd[1]: Started dhcpcd on all interfaces.
Jun 29 14:20:57 fv-rpi3b systemd[1]: Starting Network.
Jun 29 14:20:57 fv-rpi3b systemd[1]: Reached target Network.
...
Jun 29 14:20:57 fv-rpi3b systemd[1]: Starting Wait for Network to be Online...
...
Jun 29 14:20:57 fv-rpi3b bash[445]: network-wait-online: addresses=
...
Jun 29 14:21:39 fv-rpi3b dhcpcd[386]: wlan0: carrier acquired
...
Jun 29 14:21:45 fv-rpi3b bash[445]: network-wait-online: addresses=
Jun 29 14:21:45 fv-rpi3b dhcpcd[386]: wlan0: leased 192.168.1.204 for 86400 seconds
...
Jun 29 14:21:45 fv-rpi3b systemd[1]: Started Wait for Network to be Online.
Jun 29 14:21:45 fv-rpi3b systemd[1]: Starting Network is Online.
Jun 29 14:21:45 fv-rpi3b systemd[1]: Reached target Network is Online.
Jun 29 14:21:45 fv-rpi3b systemd[1]: Mounting /mnt/point...
Jun 29 14:21:45 fv-rpi3b mount[1883]: mount error: could not resolve address for fvmedia: Unknown error
...
6. Trace IP address availability with Wait for Network enabled
Enable Wait for Network and reboot

As expected by now, Wait for Network makes no difference other than delaying dhcpcd forking to the background. The mount is only attempted after an IP address is obtained, but name resolution still fails:

Code: Select all

-- Logs begin at Thu 2017-06-29 14:35:08 BST, end at Thu 2017-06-29 14:36:05 BST. --
...
Jun 29 14:35:11 fv-rpi3b systemd[1]: Starting dhcpcd on all interfaces...
...
Jun 29 14:35:11 fv-rpi3b dhcpcd[368]: version 6.7.1 starting
...
Jun 29 14:35:42 fv-rpi3b dhcpcd[368]: timed out
Jun 29 14:35:42 fv-rpi3b dhcpcd[368]: forked to background, child pid 548
Jun 29 14:35:42 fv-rpi3b systemd[1]: Started dhcpcd on all interfaces.
Jun 29 14:35:42 fv-rpi3b systemd[1]: Starting Network.
Jun 29 14:35:42 fv-rpi3b systemd[1]: Reached target Network.
...
Jun 29 14:35:42 fv-rpi3b systemd[1]: Starting Wait for Network to be Online...
...
Jun 29 14:35:42 fv-rpi3b bash[551]: network-wait-online: addresses=
...
Jun 29 14:35:55 fv-rpi3b dhcpcd[548]: wlan0: carrier acquired
...
Jun 29 14:35:59 fv-rpi3b bash[551]: network-wait-online: addresses=
Jun 29 14:35:59 fv-rpi3b dhcpcd[548]: wlan0: leased 192.168.1.204 for 86400 seconds
...
Jun 29 14:35:59 fv-rpi3b systemd[1]: Started Wait for Network to be Online.
Jun 29 14:35:59 fv-rpi3b systemd[1]: Starting Network is Online.
Jun 29 14:35:59 fv-rpi3b systemd[1]: Reached target Network is Online.
Jun 29 14:35:59 fv-rpi3b systemd[1]: Mounting /mnt/point...
Jun 29 14:35:59 fv-rpi3b mount[1033]: mount error: could not resolve address for fvmedia: Unknown error
...
7. Trace IP and DNS availability with Wait for Network disabled
a. Replace the ExecStart= option in network-wait-online.service with

Code: Select all

ExecStart= \
    /bin/bash -c ' \
    HNs=$(hostname --all-fqdns); \
    IPs=$(hostname --all-ip-addresses); \
    while [ -z "$HNs" -o -z "$IPs" ]; \
    do \
        echo network-wait-online: addresses=$IPs, hostnames=$HNs; \
        sleep 0.1; \
        HNs=$(hostname --all-fqdns); \
        IPs=$(hostname --all-ip-addresses); \
    done'
b. Disable Wait for Network and reboot

The modified network-wait-online service will start tracing IP address and DNS resolver availability as soon as dhcpcd initialises (=network.target), and will satisfy network-online.target by completing successfully once an IP address has been configured and DNS resolution operates.

This log shows that by the time the interface IP address has been configured, availability of DNS resolution and general network access cannot be assumed (network-wait-online: addresses=192.168.1.204 , hostnames=). However, because network-wait-online now assures DNS resolver availability by the time it completes, network availability is also assured (because the resolver was reached and responded). Name resolution of the remote server for the mount command therefore succeeds (although the mount still fails due to the dummy share name):

Code: Select all

-- Logs begin at Thu 2017-06-29 14:43:58 BST, end at Thu 2017-06-29 14:44:56 BST. --
...
Jun 29 14:44:01 fv-rpi3b systemd[1]: Starting dhcpcd on all interfaces...
...
Jun 29 14:44:01 fv-rpi3b dhcpcd[374]: version 6.7.1 starting
...
Jun 29 14:44:01 fv-rpi3b dhcpcd[374]: forked to background, child pid 385
...
Jun 29 14:44:02 fv-rpi3b systemd[1]: Started dhcpcd on all interfaces.
Jun 29 14:44:02 fv-rpi3b systemd[1]: Starting Network.
Jun 29 14:44:02 fv-rpi3b systemd[1]: Reached target Network.
...
Jun 29 14:44:02 fv-rpi3b systemd[1]: Starting Wait for Network to be Online...
...
Jun 29 14:44:44 fv-rpi3b bash[444]: network-wait-online: addresses=, hostnames=
...
Jun 29 14:44:44 fv-rpi3b dhcpcd[385]: wlan0: carrier acquired
...
Jun 29 14:44:50 fv-rpi3b bash[444]: network-wait-online: addresses=, hostnames=
Jun 29 14:44:50 fv-rpi3b dhcpcd[385]: wlan0: leased 192.168.1.204 for 86400 seconds
...
Jun 29 14:44:50 fv-rpi3b bash[444]: network-wait-online: addresses=192.168.1.204 , hostnames=
Jun 29 14:44:50 fv-rpi3b systemd[1]: Started Wait for Network to be Online.
Jun 29 14:44:50 fv-rpi3b systemd[1]: Starting Network is Online.
Jun 29 14:44:50 fv-rpi3b systemd[1]: Reached target Network is Online.
Jun 29 14:44:50 fv-rpi3b systemd[1]: Mounting /mnt/point...
...
Jun 29 14:44:51 fv-rpi3b mount[2610]: Retrying with upper case share name
Jun 29 14:44:51 fv-rpi3b kernel: CIFS VFS: cifs_mount failed w/return code = -6
Jun 29 14:44:51 fv-rpi3b mount[2610]: mount error(6): No such device or address
...
8. Trace IP and DNS availability with Wait for Network enabled
Enable Wait for Network and reboot

Results are similar to the previous case, as expected.

Advantages of this approach

It provides reliable detection

This approach adds the necessary checking to network-online.target, testing for both IP configuration and DNS operation.

It is deterministic

Delay-based approaches may waste boot time or not wait long enough. This approach waits as long as necessary and not any longer to ensure reliable network availability.

It is future-proof?

By integrating into the Jessie systemd init process in a standard way, the configuration for this solution will hopefully survive OS upgrades ;-).

Best of all, it's still controlled by Wait for Network!

Other approaches

Boot or user login delays:
Fixed delays usually either waste time by waiting too long, or are unreliable because they do not necessarily wait long enough.

fstab noauto, x-systemd.automount options:
This delays mounting until first access. Similarly to above it is not guaranteed to introduce a long enough delay (and in practice usually doesn't).

fstab _netdev or x-systemd.requires=network-online.target options:
These are redundant with a standard Jessie systemd installation -

Code: Select all

[email protected]:~ $ man systemd.special
…
network-online.target
… 
All mount units for remote network file systems automatically pull in this unit, and order themselves after it.
…

Code: Select all

[email protected]:~ $ systemctl list-dependencies mnt-point.mount
mnt-point.mount
● ├─-.mount
● ├─system.slice
● └─network-online.target
As mentioned above, reaching network-online.target on a standard configuration does not guarantee anything. Once network-wait-online.service has been enabled:

Code: Select all

mnt-point.mount
● ├─-.mount
● ├─system.slice
● └─network-online.target
●   └─network-wait-online.service
Last edited by fvisagie on Tue Oct 10, 2017 11:39 am, edited 2 times in total.
rPi 3B with Raspbian Stretch 2017-09-07 (updated) and Kodi 18.2

User avatar
allfox
Posts: 451
Joined: Sat Jun 22, 2013 1:36 pm
Location: Guang Dong, China

Re: Improved (=reliable) Wait for Network implementation

Fri Jun 30, 2017 12:20 pm

This one sounds good!

I use pppd to connect to Internet via fiber, and I wait 60s before anything about network to start.

Your solution is more elegant.

rasp14
Posts: 57
Joined: Sun Jun 22, 2014 2:49 pm

Re: Improved (=reliable) Wait for Network implementation

Thu Dec 21, 2017 10:48 am

Thanks for this code. While it works properly in Raspbian Jessie 20170705 image with desktop environment. However, when i try it on Raspbian Jessie Lite 20170705 image, script always enter failed state. Refer to below for systemctl status message:

Raspbian Jessie Lite

Code: Select all

network-wait-online.service - Wait for Network to be Online
   Loaded: loaded (/lib/systemd/system/network-wait-online.service; enabled)
   Active: failed (Result: timeout) since Thu 2017-12-21 18:27:17 +08; 2min 39s ago
     Docs: man:systemd.service(5)
           man:systemd.special(7)
  Process: 764 ExecStart=/bin/bash -c       if [ -e /etc/systemd/system/dhcpcd.service.d/wait.conf ];      then          echo Wait for Network: enabled;          while [ -z $(hostname --all-fqdns) ];          do              sleep 1;          done;      else          echo Wait for Network: disabled;          exit 0;      fi (code=killed, signal=TERM)
 Main PID: 764 (code=killed, signal=TERM)

Dec 21 18:25:47 raspberry bash[764]: Wait for Network: enabled
Dec 21 18:27:17 raspberry systemd[1]: network-wait-online.service start operation timed out. Terminating.
Dec 21 18:27:17 raspberry systemd[1]: Failed to start Wait for Network to be Online.
Dec 21 18:27:17 raspberry systemd[1]: Unit network-wait-online.service entered failed state.

Raspbian Jessie

Code: Select all

network-wait-online.service - Wait for Network to be Online
   Loaded: loaded (/lib/systemd/system/network-wait-online.service; enabled)
   Active: inactive (dead) since Thu 2017-12-21 18:25:38 +08; 9min ago
     Docs: man:systemd.service(5)
           man:systemd.special(7)
  Process: 765 ExecStart=/bin/bash -c       if [ -e /etc/systemd/system/dhcpcd.service.d/wait.conf ];      then          echo Wait for Network: enabled;          while [ -z $(hostname --all-fqdns) ];          do              sleep 1;          done;      else          echo Wait for Network: disabled;          exit 0;      fi (code=exited, status=0/SUCCESS)
 Main PID: 765 (code=exited, status=0/SUCCESS)

Dec 21 18:25:38 raspberry bash[765]: Wait for Network: enabled
Dec 21 18:25:38 raspberry systemd[1]: Started Wait for Network to be Online.

I did some troubleshooting and try to enter the code

Code: Select all

hostname --all-fqdns
manually to both image. I got message below:

Raspbian Jessie

Code: Select all

[email protected]:~ $ hostname --all-fqdns
ennbme045
[email protected]:~ $

Raspbian Jessie Lite , Notice the empty line

Code: Select all

[email protected]:~ $ hostname --all-fqdns

[email protected]:~ $

By the look of it, it seems like Raspbian Jessie Lite is missing something that cause it not able to resolve the hostname. What could be cause? How to fix it?

I tested this using Raspbian default image available in Raspberry Pi web site, using two Raspberry Pi 3. Both Pi are on the same network.

fvisagie
Posts: 6
Joined: Wed Apr 19, 2017 11:27 am
Location: Man

Re: Improved (=reliable) Wait for Network implementation

Thu Dec 21, 2017 4:10 pm

If empty 'hostname' output were the only issue, the ExecStart script should still start up successfully and remain in the 'while [ -z $(hostname --all-fqdns) ]' loop (while the output remains empty). The start operation timing out indicates something going wrong before that point.

I'm not in a position to investigate Jessie Lite issues, but you seem headed in the right direction.

Look in the 'journalctl' output around the script's "Wait for Network: enabled" or "Wait for Network: disabled" message for any further useful information.

Run the ExecStart script manually and compare results.

Lengthen the start timeout to say 10 minutes, compare what happens and work backwards from there.

Compare everything else: does Jessie Lite have dhcpcd, is it enabled, does it behave the same, does the Jessie Lite systemd process support the required targets and keywords, does raspi-config create the 'wait.conf' file, etc.

Coming back to the empty 'hostname' output, according to https://manpages.debian.org/stretch/hos ... .1.en.html that means no network interface is configured?

Code: Select all

-A, --all-fqdns
Displays all FQDNs of the machine. This option enumerates all configured network addresses on all configured
network interfaces, and translates them to DNS domain names. Addresses that cannot be translated (i.e. because
they do not have an appropriate reverse IP entry) are skipped. Note that different addresses may resolve to the 
same name, therefore the output may contain duplicate entries. Do not make any assumptions about the order 
of the output.
rPi 3B with Raspbian Stretch 2017-09-07 (updated) and Kodi 18.2

Return to “Raspbian”