As a very simplistic explanation:
Personally I think your analogies are confusing and I also belive your post contains some misconceptions.
I guess I should clear up the confusion and explain things properly.
Floating point on arm has historically been a mess with a number of incompatible floating point units out there. However things have stabalised and nowadays most "applications processors" have started using some version of a floating point unit known as vfp*, specifically the raspberry Pi uses VFPv2. However lower end arm parts still often have either no FPU or a vendor specific FPU.
For any given FPU type selection gcc offers three ways of handling floating point. These are controlled by the -mfloat-abi option.
The code uses integer instructions and/or calls to library routines (depending on the complexity of the operation) to perform floating point maths. No FPU is needed but floating point is slow. The library routines in question are in libgcc which is a static library (so afaict you can't just replace it at runtime). Floating point parameters to functions are passed in integer registers (or on the stack when integer registers run out).
The code uses floating point instructions so the FPU is needed but the parameters are still passed in integer registers (or on the stack when integer registers run out). This means the code is compatible with code built with -mfloat-abi=soft and it much faster than doing the floating point in software but it still incurs an overhead moving stuff between CPU and FPU.
The code uses floating point instructions and passes floating point values in floating point registers. This avoids the overhead of moving data arround between integer and floating point registers but also renders the code incompatible with code built with other -mfloat-abi settings (if the parameters to a function call aren't where the function expects them things break horribly).
Once long ago I knew something about ARM on my RiscPC. If memory serves, you could put in FPU instructions, which without a FPU would call the instruction exception vector and thence into the Floating-Point Emulator. (The strongarm never had a FPU)
So if the relevent libraries include a FPE, calling it only involves a couple of instruction cycles even if our target hardware doesn't have a FPU
That was what the old debian arm port (not armel or armhf) did. Unfortunately there were two problems
1: the FPU the instructions were for was an old one (known as FPA) which pretty much no chips had anymore
2: it turns out that trapping into the kernel on an illegal instruction doing the floating point in the kernel and then returning the results to userspace is a LOT slower than just doing the floating calculations in software in userspace.
The result was that floating point performance on most arm hardware at the time was horrifically bad and with the mess of floating point units arround at the time going for software floating point seemed like the best option for a new arm port.
* Note that while vfp stands for "vector floating point" it actually has relatively little in the way of vector functionality
. Decent vector support was added with the NEON extensions (which are not supported on the Pi)