[SOLVED] using network bond0 fails to bring up wlan0


6 posts
by whatdoido » Tue Jan 31, 2017 11:37 pm
UPDATE: solution/fix here: https://www.raspberrypi.org/forums/viewtopic.php?p=1107810#p1107810


--- original post ---
https://www.raspberrypi.org/forums/viewtopic.php?f=28&t=166355&p=1075365&hilit=bond+wlan0#p1075365

I followed the discussion in the thread above to set up my RPi B for bondig. I have eth0 and an USB wifi dongle which I want to create a bond0 interface, with an active-backup policy where eth0 is primary, on a static IP.

The bond0 is created (kernel module loaded) with the setup as desired showing eth0 is primary.
Code: Select all
$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup) (fail_over_mac active)
...

Wit the ethernet cable plugged in, bond0 is the interface shown as the active route and I can access the internet etc. However when the ethernet cable is pulled the wlan0 interface never successfully comes up. I can see in /var/log/syslog that the system tries to bring up wlan0 but it fails with the following errors:
Code: Select all
Jan 30 19:47:16 rpi-one0 kernel: [   55.922949] R8188EU: ERROR indicate disassoc
Jan 30 19:47:16 rpi-one0 kernel: [   55.929657] bond0: link status definitely down for interface wlan0, disabling it
Jan 30 19:47:16 rpi-one0 kernel: [   55.929747] bond0: now running without any active interface!
[b]Jan 30 19:47:16 rpi-one0 wpa_supplicant[417]: wlan0: CTRL-EVENT-DISCONNECTED bssid=00:62:2c:09:0a:9d reason=3 locally_generated=1
Jan 30 19:47:16 rpi-one0 wpa_supplicant[417]: wlan0: WPA: 4-Way Handshake failed - pre-shared key may be incorrect
Jan 30 19:47:16 rpi-one0 wpa_supplicant[417]: wlan0: CTRL-EVENT-SSID-TEMP-DISABLED id=0 ssid="goaway" auth_failures=2 duration=26 reason=WRONG_KEY[/b]
Jan 30 19:47:16 rpi-one0 wpa_supplicant[417]: wlan0: CTRL-EVENT-SSID-TEMP-DISABLED id=0 ssid="goaway" auth_failures=3 duration=36 reason=CONN_FAILED

It complains about "wrong key" but I know the password is correct. Adding/removing "denyinterfaces eth0 wlan0 bond0" from /etc/dhcpcd.conf makes no difference. Running
Code: Select all
wpa_supplicant -dd -b bond0 -i wlan0 -c /etc/wpa_supplicant/wpa_supplicant.conf

also does not succeed, generating the same errors in the syslog as before. No assocation with the AP is made.

The wlan0 setup/wpa_supplicant file (passwords etc) are valid as I can connect when the system is brought up without the bond and /etc/network/interfaces only contains the wlan0 device.

Anyone has any idea why this may be failing?

Below is my /etc/network/interfaces file used for bonding the interfaces
Code: Select all
auto lo
iface lo inet loopback

# confirm status at
# $ cat /proc/net/bonding/bond0

auto bond0
iface bond0 inet static
    bond-slaves         none
    bond-primary    eth0
    bond-mode           active-backup
    bond-miimon             200
    bond-fail_over_mac  active
    bond-primary_reselect always
    bond-updelay        200
    bond-downdelay  0
    address             192.168.0.156
    netmask         255.255.255.0
    gateway             192.168.0.1
    dns-nameservers 8.8.8.8

auto eth0
#allow-hotplug eth0
iface eth0 inet manual
    bond-master         bond0
    bond-primary    bond0
    bond-mode           active-backup

auto wlan0
#allow-hotplug wlan0
iface wlan0 inet manual
    bond-master             bond0
    bond-primary        bond0
    bond-mode               active-backup
    bond-give-a-chance  500
    wpa-bridge              bond0
    #address 192.168.0.156
    #netmask 255.255.255.0
    #gateway 192.168.0.1
    #dns-nameservers 8.8.8.8
    wireless-power      off
    wpa-conf                /etc/wpa_supplicant/wpa_supplicant.conf


This is running on a Raspberry Pi B with Linux rpi-one0 4.4.38+ kernel on jessie.

Thanks.
Last edited by whatdoido on Thu Feb 02, 2017 11:33 am, edited 4 times in total.
Posts: 3
Joined: Tue Jan 31, 2017 11:11 pm
by epoch1970 » Wed Feb 01, 2017 10:57 am
The only thing I will say is that your network interfaces file is not the same as the one(s) in the thread. (bond-primary bond0 is wrong, BTW)
Bonding with wlan is finicky, so a different config with different hardware probably has its own challenges.

From your traces, the wifi card doesn't seem to be able to re-auth, it could be that the bond did not flip its MAC.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel
Posts: 959
Joined: Thu May 05, 2016 9:33 am
by whatdoido » Wed Feb 01, 2017 11:24 am
Thanks for responding. I originally started with the configuration as shown in the post referenced (where you provided some great input). However that configuration did not work - same errors in the syslog.

From your traces, the wifi card doesn't seem to be able to re-auth, it could be that the bond did not flip its MAC.


From what I could gather in the syslog, the wlan0 never auths or gets associated with the AP (maybe thats the same thing) ever when the system is setup with bond0

With bond0 having no active interfaces, killing all wpa_supplicants and then running wpa_supplicant manually (to force an association) still fails
Code: Select all
  wpa_supplicant -dd -b bond0 -i wlan0 -c /etc/wpa_supplicant/wpa_supplicant.conf

Above was tried without the -b bond0 too with no success. The wpa_supplicant.conf has been set to use passwd in plaintext and also with encoded string but nothing. To confirm, the wpa_supplicant setup is valid for non-bond0/wlan0 only networking.


One item I did not try was to make wlan0 primary (so the MAC of bond0) is initially the same as bond0, although the kernel seems to suggest (via /proc/) that "fail_over_mac active" is set.

I'll snoop/wireshark the AP to see if I can capture the packets from the RPi wlan0 to confirm if the right MAC is being sent and update the "bond-primary" to say eth0 and retry although the system does bring up eth0 when its available and only tries wlan0 when eth0 is not there..

Will update with findings

FWIW, the wifi dongle is a RealTek 8188EU (USB id 0bda:8179) using the stock kernel driver.

thanks again
Posts: 3
Joined: Tue Jan 31, 2017 11:11 pm
by epoch1970 » Wed Feb 01, 2017 12:18 pm
whatdoido wrote:I'll snoop/wireshark the AP to see if I can capture the packets from the RPi wlan0 to confirm if the right MAC is being sent and update the "bond-primary" to say eth0 and retry although the system does bring up eth0 when its available and only tries wlan0 when eth0 is not there..

The card will send its own MAC, no need to snoop that. The problem could be the bond is not using the MAC of the wifi card. Just cat /proc/net/bonding/bond0 and ifconfig bond0 to check bond is in the right mode and does flip its MAC to the wifi card's MAC when it tries to re-associate.

By default the bonding interface always returns to using the primary, so eth0 in your case. You insist on it with "bond-primary_reselect always". See bonding.txt in the linux kernel documentation for other options. That won't change anything to your authentication issue.

Making wlan0 the primary might work once but if you still have a MAC flip-flop issue it won't work long. AFAIK you can't spoof the MAC of eth0 on a Pi. Any other policy than "bond-fail_over_mac active" won't work with the built-in ethernet card or wifi cards.

You have updelay 200ms. Obviously the wifi card takes much more than that to re-associate. If bond0 does flip its mac correctly, check that the wlan slave is not set inactive (while up and ready to work) because it was activated too soon and failed to work.
"bond-give-a-chance" is not a module option, it's something baroque and possibly Debian-specific. I think it serves as a timeout mechanism within ifup/ifdown designed to overcome an improper updelay setting. I would try to do without it if possible.

wpa_supplicant should not care about bond0. Just use the regular config as if wlan0 was working solo.
Repeat: you have multiple instances of the stanza "bond-primary bond0" that make no sense. eth0 (or wlan0 if you like) is your primary slave.
You seem to want to bridge the wifi interface "wpa-bridge blah". You can't bridge a wifi card in client mode. You can't bridge without a bridge.
(Can you enslave a bridge? Can you make a bond link a bridge member? In theory I think you can do both. In practice I know bridging a bond link works.)

Have fun.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel
Posts: 959
Joined: Thu May 05, 2016 9:33 am
by whatdoido » Wed Feb 01, 2017 9:46 pm
I've managed to 'solve' the problem but i don't understand how it impacts.

The 'fix' that works for me is to add "bonding" kernel parameters in /etc/modprobe.d/, even though the "bond-*" parameters should have matched as per the /proc interface.

With the kernel mod params in place, and rebooting, the system behaves as expected: continuous ping to another machine is NOT interupted even when the ethernet cable is pulled (we see wlan0 is active under /proc/net/bonding/bond0) - plugging the ethernet cable back, the system starts to use that net i/f. Booting with/without the eth0 cable but with wifi dongle attached also behaves as expected (we have access to the network). There is only one route as expected - previous solutions defining the same IPs across eth0 and wlan0 create multiple routes and the routes are chosen based on priority (eth0 wins by default).
Code: Select all
$ route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         192.168.0.1     0.0.0.0         UG    0      0        0 bond0
192.168.0.0     0.0.0.0         255.255.255.0   U     0      0        0 bond0


The success tell tale sign was to see the wifi associated with AP via "iwconfig"

------ HOW TO SETUP BOND with ETH0 and WLAN0 ------


The working configs on my system (Raspberry Pi B) - the only other requirement is to be install "ifenslave"
Code: Select all
apt-get install ifenslave


Code: Select all
# /etc/network/interfaces
auto lo
iface lo inet loopback

# confirm status at
# $ cat /proc/net/bonding/bond0

auto eth0
allow-hotplug eth0
iface eth0 inet manual
    bond-master bond0
    bond-mode active-backup

auto wlan0
allow-hotplug wlan0
iface wlan0 inet manual
    bond-master bond0
    bond-mode active-backup
    wireless-power off
    wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

auto bond0
iface bond0 inet static
    bond-slaves   none
    bond-primary eth0
    bond-mode active-backup
    bond-miimon   200
    bond-fail_over_mac active
    bond-primary_reselect always
    bond-updelay 200
    bond-downdelay 0
    address   192.168.0.156
    netmask   255.255.255.0
    gateway   192.168.0.1
    dns-nameservers   8.8.8.8


Code: Select all
# /etc/modules
...
# add this near the top to force load module at boot
bonding


Code: Select all
# /etc/modprobe.d/bonding.conf
options bonding fail_over_mac=active mode=active-backup primary=eth0 primary_reselect=always


The following line is also added (i dont know if this matters)
Code: Select all
# /etc/dhcpcd.conf
...
denyinterfaces eth0 wlan0 bond0
Last edited by whatdoido on Thu Feb 02, 2017 11:37 am, edited 2 times in total.
Posts: 3
Joined: Tue Jan 31, 2017 11:11 pm
by epoch1970 » Thu Feb 02, 2017 10:03 am
Congratulations, you've made it :)

Module options are taken into account as soon as the module is loaded, if adding these options solved the problem it means you are met with some boot order issue.

wlan0 is "allow-hotplug" which means the addition of the USB device launches its configuration. This happens whenever, but udev is a very early process.
At "normal" network configuration time, /etc/network/interfaces is read from the top down.

In /etc/network/interfaces, for eth0 and wlan0 definitions you have "bond-master bond0". This probably causes the system to spawn a bond0 interface if it doesn't exist yet. That interface would have default options, some of which can't be changed after creation. So you end up with a bond0 that is not configured according to its definition in the interfaces file.

If udev was not part of the dance, I would move the definition of bond0 to the top of /etc/network/interfaces, so that bond0 exists with the expected configuration before slaves eth0 and wlan0 are added.

Since udev has its part (via allow-hotplug), I would try to keep the order as you have, but remove "bond-master bond0" from eth0 and wlan0 definitions and add "bond-slaves eth0 wlan0" to the definition of bond0. I expect this would create independent interfaces as soon as the system deems it necessary, and enslave them later when bond0 gets created. (Yet the problem with that might be that wlan0 is removable and might not be enslaved back if it is plugged in after bond0 is created. Perhaps via an "up" or "post-up" option you could care for that...)

Might work, might not work. That's the black art of concurrent booting for you.
Module options are indeed a stable and sure way to override the issue.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel
Posts: 959
Joined: Thu May 05, 2016 9:33 am