khillig
Posts: 4
Joined: Wed Jun 19, 2019 7:53 pm

Dropped packets over wifi between Pi Zero W and Pi 3B+

Thu Jun 20, 2019 2:13 am

We're trying to build a system where a master Pi 3B+ controls a dozen (or more) Pi Zero W clients - all running the latest raspbian (as of 18:00 EDT today) - in an isolated WiFi island (no connectivity to the outside world).

Master:pi$ uname -a
Linux Master 4.19.42-v7+ #1219 SMP Tue May 14 21:20:58 BST 2019 armv7l GNU/Linux

Client:pi$ uname -a
Linux Client 4.19.42+ #1219 Tue May 14 21:16:38 BST 2019 armv6l GNU/Linux

The pi 3B+ is set up as a stand-alone WiFi router and DNS/DHCP server using hostapd and dnsmasq; the client Pi Zeros have no trouble authenticating and connecting to the SSID it's announcing or to our home wifi networks when needed (e.g. for apt-get).

We're consistently seeing flaky connectivity between the Pi Zeros and the 3B+ - my usual test of 300 pings to the master from a client (or vice versa) typically shows 10-30% packet loss. We do NOT see any problems between any of the Pis and other non-raspberry devices (laptops, etc.) over either the "island" SSID or our home WiFi networks.

Packet captures with tcpdump at both ends show that a Pi Zero client does receive everything sent from the master; however the master does NOT report receiving all of the packets that the client claims to be sending.

tcpdump also shows the Zero W sending ARP requests every 40 seconds when on my home WiFi network; when talking directly to the 3B+ I see much more frequent ARP requests - from both the client and master - at irregular intervals from 3 to 50 seconds.

On the theory that there might be interference caused by Bluetooth devices, we've turned off Bluetooth on both the Pi zeros and the 3B+, and turned off all other nearby Bluetooth devices as well, but this has had no noticeable effect on the problem.

We've pretty much run out of ideas - can anyone shed some light on this?

epoch1970
Posts: 3366
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Thu Jun 20, 2019 10:53 am

Perhaps a dozen clients is too much for the built-in wifi interface in Pi3B+?
If the phy in Pi3B+ was dropping MACs I think the effect would be consistent with what you see.
In this case the same setup with less Pi0s should work, and the same deployment with an AP in place of hostapd (or another wifi adapter I suppose) would too.

My eyes glazed last time I looked at the brcmfmac driver code searching for a STA limit... There was an inconclusive thread (AFAIK) on the subject of "maximum STAs" recently on the forum. If the theory proves to be correct, your data point would be valuable I think.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

Andyroo
Posts: 3864
Joined: Sat Jun 16, 2018 12:49 am
Location: Lincs U.K.

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Thu Jun 20, 2019 1:14 pm

epoch1970 wrote:
Thu Jun 20, 2019 10:53 am
Perhaps a dozen clients is too much for the built-in wifi interface in Pi3B+?
If the phy in Pi3B+ was dropping MACs I think the effect would be consistent with what you see.
In this case the same setup with less Pi0s should work, and the same deployment with an AP in place of hostapd (or another wifi adapter I suppose) would too.

My eyes glazed last time I looked at the brcmfmac driver code searching for a STA limit... There was an inconclusive thread (AFAIK) on the subject of "maximum STAs" recently on the forum. If the theory proves to be correct, your data point would be valuable I think.
I think this was the thread here and in it mina finds between 20-30 users is the max for Moodle when used as a web server.
Need Pi spray - these things are breeding in my house...

khillig
Posts: 4
Joined: Wed Jun 19, 2019 7:53 pm

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Thu Jun 20, 2019 1:54 pm

At present I'm testing with a single Pi Zero W, we won't test with the full complement of clients until we can resolve this - the limited number of clients isn't a factor at this point.

The plan calls for the master to send a single UDP broadcast message to trigger all of the clients simultaneously - which should work even with the current behavior, since the client does appear to receive everything the master sends. And we can minimize the load on the master by polling the clients sequentially to get the results back from them, so the master would effectively see only one "user" at a time.

But getting the responses from the client is a challenge - UDP packets don't always get through, while TCP connections stall and eventually time out due to too many retransmissions. This problem was first noticed when setting up NTP on the clients to synchronize clocks with the master; the master doesn't reliably receive the NTP queries from the clients, and the "missing" replies keep this from working.

I just noticed that "ifconfig wlan0" on the Pi Zero W is reporting a non-zero (and rising) count for "Tx excessive retries" - not sure what this means yet, though...

khillig
Posts: 4
Joined: Wed Jun 19, 2019 7:53 pm

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Thu Jun 20, 2019 9:04 pm

Digging through more web pages suggested that there might be two things that could cause WiFi problems - wireless power management and 802.11n. I have now done the following:

1) Disabled power management at both ends, by adding "power-management off" to the wlan0 interface in /etc/network/interfaces and rebooting. At first it looked like this fixed the problem, but it turns out the dropped packet behavior follows an interesting pattern (see below) where sometimes everything works properly for a few minutes.

2) Explicitly disabled 802.11n on the master, by setting ieee80211n=0 in /etc/hostapd/hostapd.conf, though it should be off by default. As expected, this had no effect.

3) Turned on logging for hostapd, by adding DAEMON_OPTS="-dd -t -f /var/log/hostapd.log" to /etc/default/hostapd (the "-f" option isn't documented in the hostapd man page). However, monitoring this log while running tests didn't show any events that correlated with the beginning or end of a time when packets are lost.

Looking at ICMP packets captured on the client, I noticed that there's a definite pattern to the packet loss (testing with "ping -c 1000 master" from the client), I see periods of 1-3 minutes when things are normal, intermixed with periods of 1-2 minutes duration where every third packet is seen by the master - i.e. "pass, drop, drop, pass, drop, drop, pass, ..." - with occasional runs of four or six drops in a row.

I've started a longer packet capture run to get a better idea of the timing between good/bad transitions and see if I can correlate these with other events, but I won't have the data for a while...

Does anybody out there have a 3B+ and a Zero W - and the time and inclination - to try to replicate what I'm seeing?

epoch1970
Posts: 3366
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Thu Jun 20, 2019 9:38 pm

khillig wrote:
Thu Jun 20, 2019 9:04 pm
Disabled power management at both ends, by adding "power-management off" to the wlan0 interface in /etc/network/interfaces and rebooting.
God knows what ifupdown does behind the scenes...
I would rather trust "iwconfig wlan0 power off" or "iw wlan0 set power_save off"

(No 3B+ or W, here. Sorry)
Last edited by epoch1970 on Thu Jun 20, 2019 11:50 pm, edited 1 time in total.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

dshaw619
Posts: 12
Joined: Thu Jan 04, 2018 7:06 am
Location: San Diego, California, USA

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Thu Jun 20, 2019 10:41 pm

khillig wrote:
Thu Jun 20, 2019 9:04 pm

...
Does anybody out there have a 3B+ and a Zero W - and the time and inclination - to try to replicate what I'm seeing?

I don't have exactly the same configuration as you, but here's a little data in case it's helpful (dropped 2 of 1000 packets):

RPi3B+ (CoconutCream) as WiFi AP connected to I'net via tethered phone.
Pi0W (DutchApple) located in another room upstairs. (CUPS, MQTT, NodeRed server)
2nd RPi3B+, Linux Mint laptop, Windows 10 laptop also connected to CoconutCreamAP.
Connected to RPi3B+ and Pi0W via VNC from Linux Mint laptop and also doing light I'net browsing while this test ran.

[email protected]:~ $ uname -a
Linux CoconutCream 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l GNU/Linux
[email protected]:~ $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

[email protected]:~ $ uname -a
Linux DutchApple 4.14.98+ #1200 Tue Feb 12 20:11:02 GMT 2019 armv6l GNU/Linux
[email protected]:~ $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"

[email protected]:~ $ ping -c 1000 coconutcream.local
PING coconutcream.local (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=1 ttl=64 time=12.8 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=2 ttl=64 time=47.6 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=3 ttl=64 time=9.83 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=4 ttl=64 time=39.4 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=5 ttl=64 time=5.55 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=6 ttl=64 time=6.91 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=7 ttl=64 time=11.3 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=8 ttl=64 time=6.53 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=9 ttl=64 time=8.38 ms
64 bytes from 10.10.10.1 (10.10.10.1): icmp_seq=10 ttl=64 time=5.09 ms
...
--- coconutcream.local ping statistics ---
1000 packets transmitted, 998 received, 0% packet loss, time 1000615ms
rtt min/avg/max/mdev = 1.938/13.434/223.070/21.502 ms

khillig
Posts: 4
Joined: Wed Jun 19, 2019 7:53 pm

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Fri Jun 21, 2019 1:05 am

Interesting result - are you running NTP? I'm beginning to suspect that something about NTP traffic is triggering all of this!

Looking at my last extended test I find that transitions from normal to flaky behavior *always* occur within 1-2 seconds after an NTP request outbound from the client, while transitions from flaky back to normal also *only* happen after an outbound NTP request, but usually take longer (1 to 9 seconds). Occasionally an NTP query does not trigger a change - only four times in a 30-minute test (with NTP queries generated every 64 seconds).

I've turned off ntpd on the client and I'm now running another half-hour test; if I don't see the problem in this then I think I've found a smoking gun - the question then being if I'm hitting a bug in the IP stack or the wireless driver (which I don't think I'm ready to try to fix!) or a bug in ntpd (which I'm also not ready to tackle) - but I hope someone out there will take a deeper look!

And if ntpd is the culprit then I'll see if I timesyncd on the clients will do the job...

dshaw619
Posts: 12
Joined: Thu Jan 04, 2018 7:06 am
Location: San Diego, California, USA

Re: Dropped packets over wifi between Pi Zero W and Pi 3B+

Fri Jun 21, 2019 3:25 am

Just running the standard (I believe) Raspbian systemd-timedatectl.

Return to “Troubleshooting”