I did compile a stage-3 specific for the hardware acceleration unit "neon" and to use the "vfpv4" floating point unit.
This page :
The File is armv7a_neonvfpv4_hardfp-20160320
CFLAGS="-02 -pipe -march=armv7-a -mfpu=vfpv3
Optimised for Raspberry 2 and 3
CFLAGS="-02 -pipe -march=armv7-a -mfpu=neon-vfpv4
-ffast-math -mfloat-abi=hard "
Really one has to read the cryptic GCC documentation and poke around ARM website to make sense of it all. But the short version is that Raspbian has an antique binary for ARM 6, gentoo does a generic one size fits all ARM 7, but the Raspberry has the sexy extra hardware acceleration and the newer floating point so, knowing people would want it, I cooked it.
I am rather cynical
as to actual speed improvements, but for a Raspberry 3 an optimised Stage-3 probably makes a lot of sense - should you be trying to play say X265 encoded video.