epoch1970
Posts: 5924
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Network bonding problem (eth0 + wlan0)

Thu Jan 31, 2019 12:23 pm

Sorry but it works for me. I stopped the AP for a while, reconnection works fine.

You're using a development version of Raspbian (+firmware...) and an old-ish version of wpa_supplicant. By definition the development version of something is not stable and probably broken.
I would suggest trying recent *releases* of both (Raspbian first).

Also, wpa_cli is an executable. Just summon it as you please: "wpa_cli -i wlan0 status", or in interactive mode as below

Code: Select all

$ wpa_cli
wpa_cli v2.7
Copyright (c) 2004-2018, Jouni Malinen <j@w1.fi> and contributors

This software may be distributed under the terms of the BSD license.
See README for more details.


Selected interface 'wlan0'

Interactive mode

> interface wlan0
Connected to interface 'wlan0.
> status
wpa_state=SCANNING
address=b8:27:eb:YY:YY:YY <- This is the MAC of the 1st active link, eth0
<3>CTRL-EVENT-SCAN-STARTED 
<3>CTRL-EVENT-SCAN-RESULTS 
<4>Failed to initiate sched scan
<3>CTRL-EVENT-NETWORK-NOT-FOUND <- AP powered off for 5+ minutes
...
<4>Failed to initiate sched scan <- AP powered back up, comes up
<3>CTRL-EVENT-NETWORK-NOT-FOUND 
<3>CTRL-EVENT-SCAN-STARTED 
<3>CTRL-EVENT-SCAN-RESULTS 
<3>Trying to associate with c0:3f:0e:3b:99:61 (SSID='NETGEAR' freq=2417 MHz)
<3>Associated with c0:3f:0e:3b:99:61
<3>CTRL-EVENT-CONNECTED - Connection to c0:3f:0e:3b:99:61 completed [id=0 id_str=]
<3>CTRL-EVENT-SUBNET-STATUS-UPDATE status=0
...
> status
bssid=c0:3f:0e:3b:99:61
freq=2417
ssid=NETGEAR
id=0
mode=station
pairwise_cipher=CCMP
group_cipher=CCMP
key_mgmt=WPA2-PSK
wpa_state=COMPLETED
address=b8:27:eb:YY:YY:YY 
> quit
Then, disconnected eth0 and tried pinging:
dmesg:

Code: Select all

[11284.001504] bond0: link status up for interface wlan0, enabling it in 400 ms
[11284.421519] bond0: link status definitely up for interface wlan0, 0 Mbps full duplex
...
[11352.461986] bond0: link status definitely down for interface eth0, disabling it
[11352.461998] bond0: making interface wlan0 the new active one
ping:

Code: Select all

$ ping -c 1 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=121 time=7.099 ms

--- 8.8.8.8 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 7.099/7.099/7.099 ms
This is a great feature. No idea how dhcpcd or the Desktop feel about a bond0 interface, though.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

User avatar
no3rpi
Posts: 30
Joined: Fri Mar 31, 2017 11:44 am

Re: Network bonding problem (eth0 + wlan0)

Thu Jan 31, 2019 2:47 pm

I took your advice and reverted back to latest stable release and now wifi reconnect ok without wpa-roam need so it look like the development kernel was the problem.

Code: Select all

apt-get install --reinstall raspberrypi-bootloader raspberrypi-kernel
apt-get dist-upgrade

uname -a
Linux rpi3 4.9.35-v7+ #1014 SMP Fri Jun 30 14:47:43 BST 2017 armv7l GNU/Linux

Code: Select all

cat /proc/net/bonding/bond0 
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth0 (primary_reselect always)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 200
Up Delay (ms): 200
Down Delay (ms): 0

Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: b8:27:eb:XX:XX:XX
Slave queue ID: 0

Slave Interface: wlan0
MII Status: up
Speed: Unknown
Duplex: Unknown
Link Failure Count: 4
Permanent HW addr: b8:27:eb:YY:YY:YY
Slave queue ID: 0
Thank you for reply.
RPI3^2 + RPI4 = :idea:

Msprg
Posts: 5
Joined: Thu Mar 30, 2017 11:11 am

Re: Network bonding problem (eth0 + wlan0)

Sun Apr 12, 2020 10:28 pm

Okay, I am going to resurrect this thread a bit.

While setting up bond between eth0 and wlan0, same as you, I encountered a strange problem. I managed to solve this problem prior to writing this post, but I am writing this just no one else will struggle same as me for almost a week...

So, I set up my bond with configurations by marcelser (memberlist.php?mode=viewprofile&u=235138) with the exception of using static IP instead of DHCP.

/etc/modprobe.d/bonding.conf

Code: Select all

options bonding fail_over_mac=none mode=active-backup primary=eth0 primary_reselect=always
/etc/network/interfaces

Code: Select all

# interfaces(5) file used by ifup(8) and ifdown(8)

# Please note that this file is written to be used with dhcpcd
# For static IP, consult /etc/dhcpcd.conf and 'man dhcpcd.conf'

auto lo
iface lo inet loopback

auto eth0
allow-hotplug eth0
iface eth0 inet manual
        bond-master     bond0
        bond-mode       active-backup

auto wlan0
allow-hotplug wlan0
iface wlan0 inet manual
        wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
        bond-master     bond0
        bond-mode       active-backup
        
auto bond0
#iface bond0 inet dhcp
iface bond0 inet static
    address 192.168.0.151
    netmask 255.255.255.0
    gateway 192.168.0.1
    dns-search google.com
    dns-nameservers 192.168.0.1 8.8.8.8

    bond-slaves           none
    bond-primary          eth0
    bond-mode             active-backup
    bond-miimon           200
    bond-fail_over_mac    follow
    bond-primary_reselect always
    bond-updelay          200
    bond-downdelay        0
    hw-address            8c:ae:4c:eb:94:90
I also edited /etc/dhcpcd.conf with "denyinterfaces eth0 wlan0" just as him for the same reason.


The problem:
But there was a strange problem with this configuration. Devices wihich had wired connection to my network, could ping the RPi without problem. But wireless devices could not estabilish any kind of connection to the RPi other than ARP Discovery, and ARP Response packets. So the wired devices could conncet to the RPi, the wireless could not, but, when I unplugged the ethernet cable from the RPi, RPi failed-over to using wlan0 interface. At this state, all devices were able to ping the RPi, wired as well as wireless. But with the ethernet cable connected to the RPi, wireless devices could not connect again.

Interesting was, that shutting down the wlan0 interface on RPi caused that wireless clients on the network could ping RPi sucesfully, but that is not what fail redundacy means, so this is not plausible configuration.


Explanation of the cause:
Turns out, it was caused by multiple things. 1. When you bond interfaces in Active-Backup mode, all physical interfaces will have same MAC address. I knew that before.

But now, on the multifunction networking device, or as it is mor known "Wi-Fi router", there is same MAC adress twice. One is on internal switch, where eth0 is connected, and second is on wireless switch where wlan0 is connected.

Some//most of these Wi-Fi routers have implemented algorithm for matching packet dest.MAC to MAC in ARP table. But apparentely, the algorithm firstly searches for the mac on the interface on which the packet was recieved, and then on other interfaces.

But because there is same MAC on WIreless and on Integrated switch, from wireless device the first MAC matches the wlan0 interface of the RPi, so it sends packet over Wi-Fi to the wlan0 interface. But becuase on RPi is wlan0 in bond, and the currently active interface is other than wlan0 (in our case eth0 is active interface as long as the cable is connected), the packet on wlan0 will be dropped, and thus, system of RPi never knew about that someone is trying to ping it.

So basically what is happening is:
0. Some wireless device sends ICMP (ping) Request.
1. The ICMP request arrives to the Wi-Fi router on wireless interface.
2. The Wi-Fi router searches MAC address on wireless interface for matching MAC address.
3. The MAC of wlan0 of the RPi matches, so the Wi-Fi router sends packet wirelessly to wlan0 of the RPi.
4. On the RPi the wlan0 interface gets ICMP request packet, and as configured, forwards it to the bond0 interface.
5. bond0 receives the packet, but becuase wlan0 IS NOT currently active interface (the eth0 is) it drops the ICMP request. The journey of this packet here ends.

So, how to fix that?
We could repair/modify the algorithm on the Wi-Fi router, but that would require much more research, informations and time... not a viable option.

We could make MAC addresses of eth0 and wlan0 different. Yes, whis would be easy and work, but any time fail-over to wlan0 or fail-back to eth0 should happen, the ARP table entry on the Wi-Fi router and/or network devices would have to change every time, and that takes time, so it would disrupt the communication while fail-over or fail-back is happening. Easy? Yes. Viable? No!


THE SOLUTION:

Turns out, the bond module has kernel parameter for that! Name of the parameter is all_slaves_active .

The official linux documentation describes what it does perfectly:
all_slaves_active

Specifies that duplicate frames (received on inactive ports) should be
dropped (0) or delivered (1).

Normally, bonding will drop duplicate frames (received on inactive
ports), which is desirable for most users. But there are some times
it is nice to allow duplicate frames to be delivered.

The default value is 0 (drop duplicate frames received on inactive
ports).
YES! This sounds like exactly the parameter we need, and sure enough, just as I changed it from 0 to 1, the wireless devices started getting ICMP (ping) replies! Only thing we need is to set this parameter to 1 at the boot time the kernel is loaded. For that to happen only thing you need to do is add:

Code: Select all

all_slaves_active=1
to file /etc/modprobe.d/bonding.conf .

And that, is all we have to do to get it working!

Here are my current configs:

/etc/modprobe.d/bonding.conf

Code: Select all

options bonding mode=active-backup primary=eth0 primary_reselect=always fail_over_mac=none all_slaves_active=1
/etc/network/interfaces

Code: Select all

# interfaces(5) file used by ifup(8) and ifdown(8)

# Please note that this file is written to be used with dhcpcd
# For static IP, consult /etc/dhcpcd.conf and 'man dhcpcd.conf'

# Include files from /etc/network/interfaces.d:
###source-directory /etc/network/interfaces.d

auto lo
iface lo inet loopback

auto eth0
allow-hotplug eth0
iface eth0 inet manual
        bond-master     bond0
#        bond-primary eth0 wlan0
        bond-mode active-backup

auto wlan0
allow-hotplug wlan0
iface wlan0 inet manual
        wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
        bond-master     bond0
#        bond-primary eth0 wlan0
        bond-mode active-backup

auto bond0
#iface bond0 inet dhcp
iface bond0 inet static
    address 192.168.0.151
    netmask 255.255.255.0
    gateway 192.168.0.1
    dns-search google.com
    dns-nameservers 192.168.0.1 8.8.8.8

    bond-slaves           none
    bond-primary          eth0
    bond-mode             active-backup
    bond-miimon           200
    bond-fail_over_mac    follow
    bond-primary_reselect always
    bond-updelay          200
    bond-downdelay        200
#    hw-address            dc:a6:32:3f:63:a9

And line

Code: Select all

denyinterfaces eth0 wlan0
in /etc/dhcpcd.conf .


I hope this post once saves someone struggling with this or similiar problem a LOT of time.
And if it saved you a lot of time, please say "thanks" for my week of hell, terror, depression and confusion.
Last edited by Msprg on Fri May 01, 2020 6:22 pm, edited 1 time in total.

User avatar
no3rpi
Posts: 30
Joined: Fri Mar 31, 2017 11:44 am

Re: Network bonding problem (eth0 + wlan0)

Mon Apr 13, 2020 5:54 am

I had this problem one year ago with CISCO AP - Autonomous : 3502, 2602... and the fix was to set AP as Root Bridge with Wireless Clients... of course you can't set AP in that mode if you use VLANs. :x so your research is the only fix.
With Huawei HG658 I did not had problems like this.

Thank you for sharing your research and fix.
RPI3^2 + RPI4 = :idea:

epoch1970
Posts: 5924
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Network bonding problem (eth0 + wlan0)

Mon Apr 13, 2020 10:01 am

Be careful with all_slaves_active. If the other end doesn’t handle the situation, what you get is a network loop. Check what happens on broadcast or multicast traffic.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

Msprg
Posts: 5
Joined: Thu Mar 30, 2017 11:11 am

Re: Network bonding problem (eth0 + wlan0)

Mon Apr 13, 2020 9:53 pm

Yes I've been thinking about the same, but the traffic seems ok.
No broadcast storm happened... Yet :D

lionhe
Posts: 1
Joined: Fri May 01, 2020 7:55 am

Re: Network bonding problem (eth0 + wlan0)

Fri May 01, 2020 8:43 am

I found the same problem on my raspberry pi v3b (non plus) and I am using the same solution as @Msprg (AllSlavesActive=1 option), but using systemd-networkd in raspbian buster (the advantage being that systemd version 241 which is included in buster supports bonding, while it was not the case in raspbian stretch). I followed instructions and I posted my findings on another site: I do not know if the policy of this forum allows me to add links to that site for details.

I am worried, as @epoch1970 warns, about possible side-effects. In my network the pi is connected to the router through a managed switch set up with a Loop detection: I do not know if this can mitigate the broadcast storm.
In my case as the wlan connection is rather weak it happened once that some hosts (actually the ones that were not reachable without the AllSlavesActive options set) were permanently not reachable. As soon as I tried to investigate with tcpdump, communications were re-established.

Apparently this problem does not occur for a raspberry pi v4: can anybody here confirm this? If this proves true, does it mean that the kernel bonding module for the raspberry pi v3b has some bug? or is this related to some hardware limitation of the pi v3 with respect to the pi v4?

Thank you for your attention,
LionHe

--------------------------------------------------------------------------------------------------------------------------------------
For reference, let me add details about my configuration

Code: Select all

Raspberry Pi 3 Model B Rev 1.2

Code: Select all

Linux openhab 4.19.75-v7+ #1270 SMP Tue Sep 24 18:45:11 BST 2019 armv7l GNU/Linux

Code: Select all

PRETTY_NAME="Raspbian GNU/Linux 10 (buster)"
NAME="Raspbian GNU/Linux"
VERSION_ID="10"
VERSION="10 (buster)"
VERSION_CODENAME=buster

FPSUsername
Posts: 5
Joined: Tue Oct 31, 2017 9:08 pm

Re: Network bonding problem (eth0 + wlan0)

Mon May 18, 2020 7:17 pm

@Msprg Thank you for the tutorial, yet I still have issues getting there. I did all steps and unfortunately the output cat /proc/net/bonding/bond0 of is still:

Code: Select all

pi@raspberrypi:~ $ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: eth0 (primary_reselect always)
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 200
Down Delay (ms): 200

Slave Interface: eth0
MII Status: up
Speed: 100 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: b8:27:eb:ff:83:dc
Slave queue ID: 0

Slave Interface: wlan0
MII Status: down
Speed: Unknown
Duplex: Unknown
Link Failure Count: 1
Permanent HW addr: 80:1f:02:f6:a0:b6
Slave queue ID: 0
I actually wanted to use balance-rr, as I've read that it can increase the throughput. My wifi adapter is a 5GHz AC one and I want to mix it with the ethernet adapter to get reliable speeds up to 94Mbit/s and beyond using WiFi.

epoch1970
Posts: 5924
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Network bonding problem (eth0 + wlan0)

Tue May 19, 2020 7:46 am

With wifi the client authentifies the MAC address of the wireless interface with the AP.
You are constrained in what is possible: i. no other MAC can communicate over wireless, and ii. if the wifi interface changes MAC authentication has to be re-done.
IIRC balance-rr does not lead to a MAC address situation on the bond interface such that the wifi link stays up and working. Only active-backup does.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

Msprg
Posts: 5
Joined: Thu Mar 30, 2017 11:11 am

Re: Network bonding problem (eth0 + wlan0)

Mon Aug 31, 2020 9:39 pm

Sry for late reply, but I had no notifications about this thread having new posts despite being subscribed. Anyways,
@FPSUsername
I was actually trying to use balance-rr, but that was causing me multiple more troubles than active-backup, and I've mainly wanted redundancy, not performance, so I wasn't trying with balance-rr.

You have link fail on the wlan0 (ur saying that wlan0 is an USB wifi adapter, try disabling onboard wifi if there is one. I would expect USB to be wlan1), which seems like your wlan0 is not connecting sucessfully. Try disabling (ifdown) all interfaces, and then enable wlan0 only and watch if it connects on it's own. If it does not, check configuration regarding SSID and PSK that is wlan0 suposed to be using. If it's all right, but wlan0 still won't connect sucefully,

completely revert networking configuration (reinstall raspbian), then connect eth0 and wlan0, verify that both Ifaces can access network independently.
Then reboot and watch if wlan0 does autoconnect after boot, if so,
do

Code: Select all

modprobe bonding
and

Code: Select all

apt install ifenslave
then begin setting up bond IF using same config as I provided in my post above.
Change only values such as static IP, gateway, DNS, iface names if needed and such.
When you'll feel ready, do reboot, and then check

Code: Select all

ifconfig
and

Code: Select all

ip a
aand
cat /proc/net/bonding/bond0
If you followed my instructions carefully, and didn't mess up, it should work - for me it does (I just did that on another RPi0, works flawlessly so far...)

Return to “Troubleshooting”