epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Making TFTP boot robust?

Fri Apr 28, 2017 12:52 pm

I need to network boot 5 to 10 clients at the same time.
The scenario is power goes down, the switches and clients go down, the server (a Pi as well) stays up thanks to an UPS. Then power goes back up and clients should restart.

I have tried this maybe 10 times by now. I am getting mostly poor results, sometimes a perfect 5/5. Not repeatable at all.

I am very intrigued by this comment from Gordon Hollingworth under his net booting blog post.
In my case indeed, there is something on the network besides the TFTP server/clients. And some of it broadcast a lot of packets.

So what's the deal with broadcasts? And in general, how to get a robust TFTP boot process?

EDIT: Kernel version (Raspbian) in the server: Linux srv 4.4.50-v7+ #970 SMP Mon Feb 20 19:18:29 GMT 2017 armv7l GNU/Linux. Upgrading now, apparently I will use an updated version of the bootloader and kernel. I will copy the new bootcode.bin to my tftp root.
Last edited by epoch1970 on Fri Apr 28, 2017 2:55 pm, edited 2 times in total.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

Martin Frezman
Posts: 1020
Joined: Mon Oct 31, 2016 10:05 am

Re: Making TFTP boot robust ?

Fri Apr 28, 2017 1:07 pm

Is there any reason you can't put the switches and clients on UPS as well?
If this post appears in the wrong forums category, my apologies.

epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Making TFTP boot robust ?

Fri Apr 28, 2017 1:58 pm

I'm lucky already there is an UPS for the server. The mass reboot scenario is unavoidable, at least for the clients. And most probably for the switches as well.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Making TFTP boot robust?

Sun Apr 30, 2017 12:06 pm

FYI according to my logs it seems dnsmasq can get confused and send a client files that do not belong to it. E.g. client s/n 123456 is being sent files under "/tftp/789abcd/"
I've seen this twice already, with 2 different switches (including a reputable one.)
Booting 5 clients at a time. The wrong s/n not a random one, but I'm not sure there is a pattern. Perhaps option "tftp-unique-root" could help?
But for now I will check how another dhcp proxy behaves.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Making TFTP boot robust?

Sun Apr 30, 2017 8:15 pm

So. I replaced the tftp server from dnsmasq with atftpd and things improved.
Then I set dnsmasq to be authoritative, only on the MACs of my Pis.
I still couldn't get 5 clients out of 5 to boot every time. Some clients do not ask for a lease, or fail to be heard, and then they will stay like that apparently forever.

Back to the drawing board, then.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

mfa298
Posts: 1387
Joined: Tue Apr 22, 2014 11:18 am

Re: Making TFTP boot robust ?

Mon May 01, 2017 10:50 am

epoch1970 wrote:I'm lucky already there is an UPS for the server. The mass reboot scenario is unavoidable, at least for the clients. And most probably for the switches as well.
If the switches are smart/managed switches then getting them on a ups would likely help as they may take a while to reboot. If they're cheap unmanaged switches then that's less likely to be an issue.

It might be worth testing a scenario where the switch is up and active for a while first, then turn on all the pi's and see how well they netboot. That at least might help narrow down potential sources of problems.

epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Making TFTP boot robust?

Mon May 01, 2017 12:58 pm

Thanks for your interest.
I mostly tested with a Netgear GS108Tv2 in factory settings. Much too good for my target setup but I couldn't trust my initial "no-name" switch.
I also mostly tested with the switch staying powered-up. I surely believe having the switch boot as well is very adverse to the process.
Mind you, I've seen 5/5 with the Netgear booting too. It may be slow to boot up, and probably has green ethernet activated by default, but at least once I've seen all 5 Pis find their way to boot completion.

With dnsmasq's tftp server I got very often the ACT led bliking thrice. I think that is because dnsmasq sends files to the wrong client. I never got that same issue with atftpd. I guess that's one for the official tutorial.

But the cause is in the Pi, nowhere else. With 5 machines requesting an IP at the same time, statistically 2 to 4 get served and will boot, and the other ones sit waiting. This is repeatable, even when I finally set dnsmasq as the dhcp server on the Pi3 server, to get rid of my possibly "improper" ISP router box.
Problem is, for how long do the clients wait? On this, no information and according to my tests, it can be forever. I've read that "an active network helps" getting machines out of timeout. I don't know what active means, but surely my test network was (or was not. I tried everything.)

So either the firmware needs some more love, or netbooting will work with Pi 4. But for now, it works as a demo and not in real life. I'm not sure this is clear enough in the available documentation.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

mfa298
Posts: 1387
Joined: Tue Apr 22, 2014 11:18 am

Re: Making TFTP boot robust?

Mon May 01, 2017 11:06 pm

epoch1970 wrote:Thanks for your interest.
I mostly tested with a Netgear GS108Tv2 in factory settings. Much too good for my target setup but I couldn't trust my initial "no-name" switch.
The other thing to check on those is whether spanning tree is running, that might stop traffic for a while after the port comes up (I can't remember the default settings for spanning tree on them).
epoch1970 wrote: With dnsmasq's tftp server I got very often the ACT led bliking thrice. I think that is because dnsmasq sends files to the wrong client. I never got that same issue with atftpd. I guess that's one for the official tutorial.
I've never liked dnsmasq, but then I've been using the isc-dhcpd server for a long time. For the last few years I've been using tftpd-hpa as a tftp server. I've not tried netbooting any Pi's yet but it's worked well for anything else using tftp (pc netboot, switch firmware etc.)

epoch1970 wrote: But the cause is in the Pi, nowhere else. With 5 machines requesting an IP at the same time, statistically 2 to 4 get served
and will boot, and the other ones sit waiting. This is repeatable, even when I finally set dnsmasq as the dhcp server on the Pi3 server, to get rid of my possibly "improper" ISP router box.
tcpdump (and wireshark) can be a useful tool(s) for seeing what's happening, learn how the dhcp process should look (discover, offer, request, ack) and check that the right options are being sent to the client.

If you have more than one thing offering dhcp - especially with different options (dnsmasq and isp router) then that might lead to some odd issues - the isp router probably won't be giving out the right options so potentially the pi clients won't know where to get the files from (they might only look at the options in the first device to offer them an IP)

epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Making TFTP boot robust?

Tue May 02, 2017 7:39 am

Thanks.
Until someone comes forward saying he/she netboots a classroom full of Pis with no problem at the flip of a switch, I will spare port mirroring and the black box study of the firmware behaviour.
BTW, the GS108T, by default, has STP and green ethernet off.
I've been looking for an alternative package to do DHCP-proxy but found none, so dnsmasq partly stayed in the mix.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

mfa298
Posts: 1387
Joined: Tue Apr 22, 2014 11:18 am

Re: Making TFTP boot robust?

Tue May 02, 2017 8:25 am

epoch1970 wrote: Until someone comes forward saying he/she netboots a classroom full of Pis with no problem at the flip of a switch, I will spare port mirroring and the black box study of the firmware behaviour.
You would probably see most of the interesting stuff on the Pi doing tftp, DHCP by it's nature is broadcast on the local subnet.
epoch1970 wrote: BTW, the GS108T, by default, has STP and green ethernet off.
You could probably save the switch powering off / rebooting issues by putting a PoE injector (or using a PoE switch) in the network closet with the Pi Server and UPS. My GS108Tv2 are both powered like that.

epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Making TFTP boot robust?

Tue May 02, 2017 10:07 am

mfa298 wrote:You would probably see most of the interesting stuff on the Pi doing tftp, DHCP by it's nature is broadcast on the local subnet.
I don't think the tftp transfers are in question, with atftpd at least. For me the problem is the Pis don't request a DHCP lease often/regularly enough, so they don't get informed of the TFTP server existence. Some of them/some times don't contact the TFTP server at all.
You could probably save the switch powering off / rebooting issues by putting a PoE injector (or using a PoE switch) in the network closet with the Pi Server and UPS. My GS108Tv2 are both powered like that.
Again, the issue I have is largely independent of the switch being powered or not at Pis boot time. In my case I have zero control over the network I am installing into. A working solution with DHCP-proxy and low-grade networking gear is my only option, besides shopping for SD cards...
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

mfa298
Posts: 1387
Joined: Tue Apr 22, 2014 11:18 am

Re: Making TFTP boot robust?

Tue May 02, 2017 11:51 am

epoch1970 wrote:
mfa298 wrote:You would probably see most of the interesting stuff on the Pi doing tftp, DHCP by it's nature is broadcast on the local subnet.
I don't think the tftp transfers are in question, with atftpd at least. For me the problem is the Pis don't request a DHCP lease often/regularly enough, so they don't get informed of the TFTP server existence. Some of them/some times don't contact the TFTP server at all.
Well you could tell that from a tcpdump. Added bold to my previous statement for emphasis. Broadcast means you can normally see it on any device on the network!

I get the feeling that there's at least two things trying to offer DHCP to the clients, the router (or something the local IT dept have setup and manage) and whatever you've tried to setup. That's likely to be causing you issues. If the local IT dept are managing the local DHCP service then you should probably talk to them about your requirements. a) there's probably a better way of doing what you need, b) they might have controls on the network to limit where dhcp responses can come from.

epoch1970
Posts: 3699
Joined: Thu May 05, 2016 9:33 am
Location: Paris, France

Re: Making TFTP boot robust?

Tue May 02, 2017 12:16 pm

We're flogging a dead horse.
- The official doc says booting works with dhcp-proxy mode on dnsmasq. I say it does not: a) dnsmasq tftp server is confused when multiple clients query for files at the same time, b) dnsmasq dhcp server does not hear enough DHCPDISCOVER queries from the Pis.
- I've fixed a) with atftpd and for b) I've put dnsmasq as a second authoritative server, only for the MACs of my Pis to take over the main DHCP server. And it doesn't help, because... the Pis don't always send DHCPDISCOVER.

At deployment time, I would not be allowed to remove the main DHCP server and use mine instead, so I am stuck with proxy or with locked-down authoritative. I've tested both and both fail. The tftp server in dnsmasq might be buggy, proxy mode as well, but it knows how to deliver a lease when set authoritative. The problem is not in the server.
"S'il n'y a pas de solution, c'est qu'il n'y a pas de problème." Les Shadoks, J. Rouxel

andig2
Posts: 51
Joined: Wed Oct 31, 2012 9:34 pm

Re: Making TFTP boot robust?

Sat Sep 02, 2017 10:57 am

I don't even get that far.

I've noticed my single Pi3 doesn't net boot at all with out bootcode.bin. It gets its IP address and I see it ask for the TFTP service IP on the network but that's that. No TFTP. Only ever got it working with bootcode.bin present. No answer to the firmware issue on github (https://github.com/raspberrypi/firmware/issues/862) :?

Cheers,
Andreas

Return to “Troubleshooting”