Page 1 of 1

Little doubt on floating point precision

Posted: Fri Dec 28, 2018 9:54 pm
by Killertechno
Hi to all, I'm porting Python code to C code and I have alittle doubt.

On my Python code I have:

Code: Select all

a = 18781
b = 0.997649
c = a*b
print("%f" % (c))
result is 18736.851729 while in C code:

Code: Select all

        int a = 18781;
        float b = 0.997649;
        double c = 0.997649;


        printf("%f\r\n", a*b);
        printf("%f\r\n", (float)(a)*b);
        printf("%f\r\n", (double)(a)*b);
result is 18736.846132 regardless operation.

OpenOffice (and calc) says: 18781 * 0.997649 = 18736.845869

Python result is so near to calc instead C result. Is this normal or do I need to set some particular flag during C compiling (gcc -o main main.c)?
(I'm doing digital filtering, so at the end these small errors would add to each other....)
Thanks.

Re: Little doubt on floating point precision

Posted: Fri Dec 28, 2018 10:53 pm
by jahboater
Killertechno wrote:
Fri Dec 28, 2018 9:54 pm
OpenOffice (and calc) says: 18781 * 0.997649 = 18736.845869
Yes that's right.
It is also exactly what I get from Python and C.

Code: Select all

$ python3
Python 3.5.2 (default, Nov 12 2018, 13:43:14) 
[GCC 5.4.0 20160609] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 18781 * 0.997649
18736.845869
>>> 
and in C:

Code: Select all

cat test.c
#include <stdio.h>
int main( void )
{
  int a = 18781;
  double b = 0.997649;
  printf( "%f\n", a * b );
}
$ gcc test.c -o test
$ ./test
18736.845869
So OpenOffice, calc, Python, Python3, and C (using "double") all give consistent identical results.
I don't know why your Python differs?

Re: Little doubt on floating point precision

Posted: Fri Dec 28, 2018 11:14 pm
by Killertechno
Here another example code:

Code: Select all

#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <dirent.h>
#include <libgen.h>
#include <math.h>
#include <netinet/in.h>
#include <sys/stat.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/time.h>
#include <sys/types.h>
#include <syslog.h>
#include <time.h>
#include <unistd.h>




#define hpf_filterorder 	1		// ordine del filtro passa-alto per la rimozione dell'offset
#define max_components  	1


#define filter_order 		hpf_filterorder	

struct FilterBuffer
{
	float input[filter_order+1];
	float output[filter_order+1];
};


struct FilterBuffer FilterValues[max_components];			// contiene i valori
struct FilterBuffer FilterCoeff;				// contiene i coefficienti in base al tipo di filtro



float f_HPF_0R075_100_b[filter_order+1] = {0.99764934, -1.0};
float f_HPF_0R075_100_a[filter_order+1] = {1.0, 0.99529868};


void Filter (void);




unsigned char maxFilterOrder;

void Filter (void)
{
	
	unsigned char ch;
	unsigned char n;
        float accu;

	for (ch=0; ch<max_components; ch++)
	{
		printf("\t\t\t\t\t\t[%f\t%f] (input buffer at beginning)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );
		printf("multiply by %f (value should be %f)\r\n", FilterCoeff.input[0], FilterValues[ch].input[0] * FilterCoeff.input[0]);
		FilterValues[ch].input[0] = FilterValues[ch].input[0] * FilterCoeff.input[0];	
		printf("\t\t\t\t\t\t[%f\t%f] (b)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );   // here  value of FilterValues[ch].input[0]
		
		accu = 4.0;
		printf("\t\t\t\t\t\t[%f\t%f] (p)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );   // here FilterValues[ch].input[0] has changed
		accu = 0.0;
		printf("\t\t\t\t\t\t[%f\t%f] (m)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );
		for( n=filter_order; n >0; n--)
		{
			printf("\t\t\t\t\t\t[%f\t%f] (r)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );
			printf("%d\r\n", n);
			printf("\t\t\t\t\t\t[%f\t%f] (c)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );
			accu += FilterValues[ch].input[n]*FilterCoeff.input[n];
			printf("\t\t\t\t\t\t[%f\t%f] (d)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );
			printf("\taccu = %f\r\n", accu);
			printf("\t\t\t\t\t\t[%f\t%f] (e)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );
			accu += FilterValues[ch].output[n]*FilterCoeff.output[n];
			printf("\t\t\t\t\t\t[%f\t%f] (f)\r\n",FilterValues[ch].input[0], FilterValues[ch].input[1] );
			printf("\taccu = %f\r\n", accu);


			FilterValues[ch].input[n] = FilterValues[ch].input[n-1];
			FilterValues[ch].output[n] = FilterValues[ch].output[n-1];
			
		}
		printf("\t\tadding %f\r\n", FilterValues[ch].input[0]);
		
		accu += FilterValues[ch].input[0];
			printf("\taccu = %f\r\n", accu);
		
		FilterValues[ch].output[0] = accu;
		FilterValues[ch].output[1] = accu;	// shifto già per il prossimo passaggio
		printf("%f\r\n", accu);
		
	}



	
	
}



int main(int argc, char *argv[]){
	unsigned char i;
	unsigned char ch;
	
	// filter init 
	for (i=0; i<(filter_order+1); i++)
	{
		printf("loading coeff for x(n-%d)\r\n", i);
		FilterCoeff.input[i] = f_HPF_0R075_100_b[i];
		FilterCoeff.output[i] = f_HPF_0R075_100_a[i];
	}
	
	printf("\r\n");
	for (ch=0; ch<max_components; ch++)
	{
		printf("zeroing values for component %d\r\n", ch);
		for (i=0; i<(filter_order+1); i++)
		{
			printf("%d ", i);
			FilterValues[ch].input[i] = 0.0;
			FilterValues[ch].output[i] = 0.0;
		}
	}	
	printf("\r\n\r\n\r\n");
	printf("start filtering...\r\n");
	printf("\r\n\r\n\r\n");	
	
	
	FilterValues[0].input[0] = 18781.0;
	Filter();
	
	FilterValues[0].input[0] = 18831.0;
	Filter();
	
	
	
	
	
	return 0;
}
gcc -o testfilter testfilter.c -O2 (I get slightly different results compiling without -O2 but I suppose it would be normal)


Then this is the output:

Code: Select all

~/myc/test_filter$ ./testfilter 
loading coeff for x(n-0)
loading coeff for x(n-1)

zeroing values for component 0
0 1 


start filtering...



						[18781.000000	0.000000] (input buffer at beginning)
multiply by 0.997649 (value should be 18736.851729)
						[18736.851729	0.000000] (b)
						[18736.851562	0.000000] (p)
						[18736.851562	0.000000] (m)
						[18736.851562	0.000000] (r)
1
						[18736.851562	0.000000] (c)
						[18736.851562	0.000000] (d)
	accu = 0.000000
						[18736.851562	0.000000] (e)
						[18736.851562	0.000000] (f)
	accu = 0.000000
		adding 18736.851562
	accu = 18736.851562
18736.851562
						[18831.000000	18736.851562] (input buffer at beginning)
multiply by 0.997649 (value should be 18786.734195)
						[18786.734195	18736.851562] (b)
						[18786.734375	18736.851562] (p)
						[18786.734375	18736.851562] (m)
						[18786.734375	18736.851562] (r)
1
						[18786.734375	18736.851562] (c)
						[18786.734375	18736.851562] (d)
	accu = -18736.851562
						[18786.734375	18736.851562] (e)
						[18786.734375	18736.851562] (f)
	accu = -88.087868
		adding 18786.734375
	accu = 18698.646507
18698.646484
I can't understand WHY value changes from (b) and (p) lines!!!!!!!!!

Re: Little doubt on floating point precision

Posted: Fri Dec 28, 2018 11:27 pm
by Killertechno
Sorry, taking a look to Python code "a" is not an integer as I supposed but iy says <type 'numpy.ndarray'>.

Re: Little doubt on floating point precision

Posted: Fri Dec 28, 2018 11:57 pm
by Killertechno
This not working:

Code: Select all

#include <stdio.h>


#define hpf_filterorder 	1		// ordine del filtro passa-alto per la rimozione dell'offset
#define max_components  	1


#define filter_order 		hpf_filterorder	

struct FilterBuffer
{
	float input[filter_order+1];
	float output[filter_order+1];
};


struct FilterBuffer FilterValues[max_components];			// contiene i valori
struct FilterBuffer FilterCoeff;				// contiene i coefficienti in base al tipo di filtro



float f_HPF_0R075_100_b[filter_order+1] = {0.99764934, -1.0};
float f_HPF_0R075_100_a[filter_order+1] = {1.0, 0.99529868};




void Filter2 (void);


void Filter2 (void)
{
	unsigned char cha;
	unsigned char n;
	
	cha = 0;
	printf("\t\t\t\t\t\t[%f\t%f] (input buffer at beginning)\r\n",FilterValues[cha].input[0], FilterValues[cha].input[1] );
	printf("multiply %f by %f (value should be %f)\r\n", FilterValues[cha].input[0], FilterCoeff.input[0], FilterValues[cha].input[0] * FilterCoeff.input[0]);
	
	FilterValues[cha].input[0] = FilterValues[cha].input[0] *  FilterCoeff.input[0];
	printf("\t\t\t\t\t\t[%f\t%f] (b)\r\n",FilterValues[cha].input[0], FilterValues[cha].input[1] );
	printf("\t\t\t\t\t\t[%f\t%f] (c)\r\n",FilterValues[cha].input[0], FilterValues[cha].input[1] );
}



int main(int argc, char *argv[]){
	unsigned char i;
	unsigned char ch;

	// filter init 
	for (i=0; i<(filter_order+1); i++)
	{
		printf("loading coeff for x(n-%d)\r\n", i);
		FilterCoeff.input[i] = f_HPF_0R075_100_b[i];
		FilterCoeff.output[i] = f_HPF_0R075_100_a[i];
	}
	
	printf("\r\n");
	for (ch=0; ch<max_components; ch++)
	{
		printf("zeroing values for component %d\r\n", ch);
		for (i=0; i<(filter_order+1); i++)
		{
			printf("%d ", i);
			FilterValues[ch].input[i] = 0.0;
			FilterValues[ch].output[i] = 0.0;
		}
	}	
	printf("\r\n\r\n\r\n");
	printf("start filtering...\r\n");
	printf("\r\n\r\n\r\n");	
	
	
	FilterValues[0].input[0] = 18781.0;
	Filter2();

	
	return 0;
}

Output is:

Code: Select all

start filtering...



						[18781.000000	0.000000] (input buffer at beginning)
multiply 18781.000000 by 0.997649 (value should be 18736.851729)
						[18736.851729	0.000000] (b)
						[18736.851562	0.000000] (c)    First value array has changed!!!!!!!!!!!

Now I change from float to double:

Code: Select all

struct FilterBuffer
{
	double input[filter_order+1];
	double output[filter_order+1];
};


struct FilterBuffer FilterValues[max_components];			// contiene i valori
struct FilterBuffer FilterCoeff;				// contiene i coefficienti in base al tipo di filtro



double f_HPF_0R075_100_b[filter_order+1] = {0.99764934, -1.0};
double f_HPF_0R075_100_a[filter_order+1] = {1.0, 0.99529868};

And result looks correct:

Code: Select all

start filtering...



						[18781.000000	0.000000] (input buffer at beginning)
multiply 18781.000000 by 0.997649 (value should be 18736.852255)
						[18736.852255	0.000000] (b)
						[18736.852255	0.000000] (c) First value array has not changed!!!!!

Am I missing something about programming or compiling?

Re: Little doubt on floating point precision

Posted: Sat Dec 29, 2018 3:50 am
by Paeryn
Floats only have 24 bits of significand precision, doubles have 53. Python should be using doubles for all floating point values.

0.997649 cannot be represented exactly by either a float or a double, but a double is a lot closer. Floats and doubles are in binary format, a lot of exact decimal fractions cannot be represented as exact binary fractions, even 0.1 has no finite binary representation.

These aren't exact values but it shows you how close the float and double representations of 0.997649 are.

Code: Select all

float  b = 0.99764901399612426758
double c = 0.99764900000000000001
Mix in the fact that you then multiply that by 18781 then if you do that entirely in floats your answer is going to be further out than if you convert the float to a double and do the multiplication in doubles or use doubles for the whole thing.

Re: Little doubt on floating point precision

Posted: Sat Dec 29, 2018 4:23 am
by jojopi
The Pi/Raspbian use IEEE754 binary floating point representation. The float type has 24 bits of precision, which is equivalent to around 7 significant digits in decimal. The double type has 53 bits, equivalent to 15 digits.

A number such as 18736.845869 will be accurate enough if you calculate and store it using double. Using float, the best you can hope for is that it rounds correctly in the seventh digit, giving 18736.85.

The "%f" format specifier for printf requests a fixed-point output, and it defaults to six digits after the decimal point. If you pass a float and there is more than one digit before the decimal point, then there simply is not the precision for there also to be six correct digits after. So you get something like 18736.85XXXX, where the last four digits are essentially garbage. (Actually the decimal representation of trailing zeros in the binary.)

Unfortunately there is no easy way to avoid printing more digits than you have using "%f". A format such as "%.7g" or "%.6e" may be a better choice if you do not know in advance how big the numbers are going to be.

I cannot reproduce your behaviour where printing the same variable twice gives a slightly different result. Since float arguments to printf have to be promoted to double anyway, maybe sometimes the compiler is passing an actual double instead of truncating to float and promoting back? I doubt that that would be standards-compliant, but it also would not be a problem if you were not trying to print more digits than you have.

Just use double. You are not likely to see much performance improvement in single precision, unless you are dealing with so much data that the size of the arrays in memory becomes significant.

Incidentally, in C the newline sequence is "\n", not "\r\n". A carriage return will be added for you on platforms that want it. Linux is not one of those.

Re: Little doubt on floating point precision

Posted: Sat Dec 29, 2018 8:06 am
by jahboater
Paeryn wrote:
Sat Dec 29, 2018 3:50 am
Floats only have 24 bits of significand precision, doubles have 53. Python should be using doubles for all floating point values.

0.997649 cannot be represented exactly by either a float or a double, but a double is a lot closer. Floats and doubles are in binary format, a lot of exact decimal fractions cannot be represented as exact binary fractions, even 0.1 has no finite binary representation.

These aren't exact values but it shows you how close the float and double representations of 0.997649 are.

Code: Select all

float  b = 0.99764901399612426758
double c = 0.99764900000000000001
Mix in the fact that you then multiply that by 18781 then if you do that entirely in floats your answer is going to be further out than if you convert the float to a double and do the multiplication in doubles or use doubles for the whole thing.
Yes,

Further confused by having two limits DBL_DIG (15) and DBL_DECIMAL_DIG (17) (for doubles) :(
which is only FLT_DIG (6) and FLT_DECIMAL_DIG (9) (for float).

In C99 hex, 0.997649 is 0x1.fecbd987c6328p-1

The multiplication "18781.0 * 0.997649" raises the FE_INEXACT exception ...

IEEE 754-2008 specifies decimal arithmetic "_Decimal64" etc if you want exact results (available in C).