Please explain softfloat vs softfp vs hardfp


11 posts
by RichardUK » Thu Jun 07, 2012 9:15 pm
I am a little confused what is going on.

softfloat - All float done in software.
softfp - Float in hardware values passed on the stack / int registers. ABI compatible with softfloat.
hardfp - Float done in hardware values passed on fpu registers. ABI incompatible with other two.

Is this correct?
User avatar
Posts: 129
Joined: Fri Jun 01, 2012 5:12 pm
by Narishma » Fri Jun 08, 2012 9:34 am
Yes.
Posts: 150
Joined: Wed Nov 23, 2011 1:29 pm
by jecxjo » Fri Jun 08, 2012 4:51 pm
As a very simplistic explanation:

Performing mathematics operations on floating point numbers (decimal numbers, not whole numbers) requires a little more overhead when working with binary values. Everyone knows that data is stored in computers as 1's and 0's, each position in the number being a power of 2 greater than the previous. So to do math with a whole number its quite simple.

Software Based Math: Do the math via pen and paper
When it comes to floating point numbers the steps to perform even a simple two value addition becomes more complicated. This process was originally performed in software, requiring multiple instructions to get the needed result. When you compile with the softfloat option this is what you are doing.

Lets view this as doing some math via pen and paper. Works but kinda slow.

Hardware Based Math: Do the math via your friend's calculator
To speed up floating point math, some smart engineers came up with a Floating Point Unit (FPU) which is a piece of hardware that can take in floating point values and an operator and return a value. This hardware is optimized to just do floating point math so it performs much better than doing the operations in software.

To get the values and operator into the FPU the compiler must add some code to copy this info from your program to the hardware. Typically this is done through a function call which requires some overhead to start the call (copy values from your code into the FPU interface code) and the complete the call (clear up the memory used to do the copy). So in the softfp situation we use the FPU hardware but use the typical function calling method to move the data around. (Note: I'll explain some benefits at the end.)

In this situation your buddy has a calculator. You write down the problem and give it to him to run on his calculator and he writes down the answer and gives it to you.

Hardware Optimized Math: Do the math on your calculator
Next we want to speed things up even more by trying to remove all that overhead of copying data from our code to the FPU interface code. One way to do that is to do the task the FPU interface code does. So if we set the hardfp option when we do an arithmetic operation we now copy the values and the operation directly into the FPU hardware registers. Now we are super fast.

In this situation you have no paper, just a calculator. But think how fast that is since you can just type the values in yourself. No need to write it down, just type it in yourself.

So why so many options?
So why not always write your code with hardfp? Sometimes systems don't have FPU's...even in today's new computers. Remember before how with the softfp option we talked to the FPU but still have the code that copied data to the hardware? What if we could swap out where the copy destination was in cases where we don't have an FPU? Now we could say "Use the FPU if it exists, otherwise copy the data into our Software based calculations (softfloat)." So when you see that softfp is compatible with softfloat it means that the system will decide if it can (and some cases should) use a hardware FPU. If we compile with the hardfp option we have no choice but to use the FPU because the compiler optimizes our system to do no math, just read and write to FPU registers.

In our calculator example, if you sit down and have no paper and only a calculator the only option you have is to use the calculator. If you have a piece of paper you can either give it to your buddy or you can do it by hand. One gives you options and can be slow but practical. The other can be fast but only if the hardware exists.

So hows that for a long winded but hopefully easy to understand explanation?
IRC (w/ SSL Support): jecxjo.mdns.org
Blog: http://jecxjo.motd.org/code
ProjectEuler Friend: 79556084281370_44d12dd95e92b1d9453aba2bdc94101b
User avatar
Posts: 157
Joined: Sat May 19, 2012 5:22 pm
Location: Ames, IA (USA)
by Burngate » Sat Jun 09, 2012 11:52 am
Makes sense!
Now how about (what I believe the Pi has) vector floating point hardware? Is it the same or different?
And since all Pis have one, why bother with softfloat at all?
Hardware ace - level: Cowboy
User avatar
Posts: 2351
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK
by jecxjo » Sat Jun 09, 2012 12:30 pm
Vector Processors allow you to access data in an array format (i.e. pointer to the head of an array and an index into the array) vs a Scalar Processor that just uses direct addressing. Not really a big difference, nothing really to worry about. VFP, FPU, etc all the same thing generally.

Since you know the architecture for all your client systems theres no reason not to just use the hardfp, but incase you wanted the option to compile for a different system, softfp might be a better choice. In my description above I generalized a little on what all takes place when compiling with this option. If you know your system has no FPU its slightly more efficient to select softfloat because the checking of the existance of an FPU takes some overhead, etc. Why ask "Do I have an FPU and is it available?" if you already know the answer?
IRC (w/ SSL Support): jecxjo.mdns.org
Blog: http://jecxjo.motd.org/code
ProjectEuler Friend: 79556084281370_44d12dd95e92b1d9453aba2bdc94101b
User avatar
Posts: 157
Joined: Sat May 19, 2012 5:22 pm
Location: Ames, IA (USA)
by Burngate » Sat Jun 09, 2012 4:37 pm
So now I'm going to prove that I don't know enough to be allowed into the Power-Users forum.
Once long ago I knew something about ARM on my RiscPC. If memory serves, you could put in FPU instructions, which without a FPU would call the instruction exception vector and thence into the Floating-Point Emulator. (The strongarm never had a FPU)
So if the relevent libraries include a FPE, calling it only involves a couple of instruction cycles even if our target hardware doesn't have a FPU
Or should I be ejected from class in shame?
Hardware ace - level: Cowboy
User avatar
Posts: 2351
Joined: Thu Sep 29, 2011 4:34 pm
Location: Berkshire UK
by obarthelemy » Sat Jun 09, 2012 11:03 pm
I'd venture a guess: *calling* an FP emulator may only be a couple of cycles. *Actually doing* soft FP takes a whole bunch of cycles more than doing hard FP ?

Plus I think the soft/hard variants are not only about actual maths, but also about how parameters are passed when calling functions, with the FP hardware providing a bunch of handy, fast registers which the no-FP CPUs don't have.
Posts: 1399
Joined: Tue Aug 09, 2011 10:53 pm
by AndrewS » Mon Jun 11, 2012 2:57 am
Burngate wrote:And since all Pis have one, why bother with softfloat at all?

Because the Debian stable version (squeeze) only supports ARM using the ARMv4+ softfloat ABI (armel), and this is the distro that the 'official' Debian image available from http://www.raspberrypi.org/downloads uses.
The Debian unstable version (wheezy) adds an ARMv7+ hardfloat ABI port (armhf), but this won't run on the ARMv6 CPU used by the RaspberryPi. So to "fill in the gap" the Raspbian project http://raspbian.com/RaspbianFAQ is currently in the process of recompiling/porting every Debian package in wheezy to use the ARMv6 hardfloat available on the RaspberryPi :ugeek:

At least, that's the way I understand it ;)
User avatar
Posts: 2193
Joined: Sun Apr 22, 2012 4:50 pm
Location: Cambridge, UK
by jecxjo » Wed Jun 13, 2012 8:28 pm
AndrewS wrote:
Burngate wrote:And since all Pis have one, why bother with softfloat at all?

Because the Debian stable version (squeeze) only supports ARM using the ARMv4+ softfloat ABI (armel), and this is the distro that the 'official' Debian image available from http://www.raspberrypi.org/downloads uses.
The Debian unstable version (wheezy) adds an ARMv7+ hardfloat ABI port (armhf), but this won't run on the ARMv6 CPU used by the RaspberryPi. So to "fill in the gap" the Raspbian project http://raspbian.com/RaspbianFAQ is currently in the process of recompiling/porting every Debian package in wheezy to use the ARMv6 hardfloat available on the RaspberryPi :ugeek:

At least, that's the way I understand it ;)


This is actually one of the "good" reasons to have softfp and softfloat support. If hardware changes and no one has ported in the support you can atleast run a slower version using software. Also if you are building a general purpose system you can sacrifice some speed for larger support.

Same goes for UI, you get added speed and flashiness if you build against the video card/opengl/etc but you are only able to run on those systems. Compiling with software based graphics is slower but everyone can support them.
IRC (w/ SSL Support): jecxjo.mdns.org
Blog: http://jecxjo.motd.org/code
ProjectEuler Friend: 79556084281370_44d12dd95e92b1d9453aba2bdc94101b
User avatar
Posts: 157
Joined: Sat May 19, 2012 5:22 pm
Location: Ames, IA (USA)
by timr » Thu Jun 14, 2012 8:56 am
Just a quick note to say that I am using the raspbian development image from http://www.raspbian.org/HexxehImages

It's just what I needed.

I'm using chuck from http://chuck.cs.princeton.edu/ for music synthesis http://www.raspberrypi.org/phpBB3/viewtopic.php?f=29&t=7503. With the soft-float image, the system doesn't have enough CPU and the audio is not usable. With hard-float, the audio is ok, at least for the simplle examples I've tried, and CPU load is acceptable, though still high at 30% - 60%
Posts: 22
Joined: Wed May 30, 2012 10:11 am
by plugwash » Wed Jan 02, 2013 4:20 pm
As a very simplistic explanation:

Personally I think your analogies are confusing and I also belive your post contains some misconceptions.

I guess I should clear up the confusion and explain things properly.

Floating point on arm has historically been a mess with a number of incompatible floating point units out there. However things have stabalised and nowadays most "applications processors" have started using some version of a floating point unit known as vfp*, specifically the raspberry Pi uses VFPv2. However lower end arm parts still often have either no FPU or a vendor specific FPU.

For any given FPU type selection gcc offers three ways of handling floating point. These are controlled by the -mfloat-abi option.

-mfloat-abi=soft
The code uses integer instructions and/or calls to library routines (depending on the complexity of the operation) to perform floating point maths. No FPU is needed but floating point is slow. The library routines in question are in libgcc which is a static library (so afaict you can't just replace it at runtime). Floating point parameters to functions are passed in integer registers (or on the stack when integer registers run out).

-mfloat-abi=softfp.
The code uses floating point instructions so the FPU is needed but the parameters are still passed in integer registers (or on the stack when integer registers run out). This means the code is compatible with code built with -mfloat-abi=soft and it much faster than doing the floating point in software but it still incurs an overhead moving stuff between CPU and FPU.

-mfloat-abi=hard
The code uses floating point instructions and passes floating point values in floating point registers. This avoids the overhead of moving data arround between integer and floating point registers but also renders the code incompatible with code built with other -mfloat-abi settings (if the parameters to a function call aren't where the function expects them things break horribly).

Once long ago I knew something about ARM on my RiscPC. If memory serves, you could put in FPU instructions, which without a FPU would call the instruction exception vector and thence into the Floating-Point Emulator. (The strongarm never had a FPU)
So if the relevent libraries include a FPE, calling it only involves a couple of instruction cycles even if our target hardware doesn't have a FPU

That was what the old debian arm port (not armel or armhf) did. Unfortunately there were two problems

1: the FPU the instructions were for was an old one (known as FPA) which pretty much no chips had anymore
2: it turns out that trapping into the kernel on an illegal instruction doing the floating point in the kernel and then returning the results to userspace is a LOT slower than just doing the floating calculations in software in userspace.

The result was that floating point performance on most arm hardware at the time was horrifically bad and with the mess of floating point units arround at the time going for software floating point seemed like the best option for a new arm port.

* Note that while vfp stands for "vector floating point" it actually has relatively little in the way of vector functionality :(. Decent vector support was added with the NEON extensions (which are not supported on the Pi)
Forum Moderator
Forum Moderator
Posts: 1982
Joined: Wed Dec 28, 2011 11:45 pm