Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 11:43 am

PeterO,

You are certainly especially dispended. In fact you get my vote for presenting the most interesting response to this challenge.

Sadly I still can't get it to run with the ALGOL 60 to C translator I have here. It translates OK but the resulting executable segfaults. A problem for another day.

Meanwhile...

My C++ effort is down to 38 seconds. 55% faster that my first working version.

Code: Select all

    $ g++ -Wall -O3  -o fibo_karatsuba fibo_karatsuba.cpp
    $ time ./fibo_karatsuba | head -c 32; time ./fibo_karatsuba | tail  -c 32;
    10727395641800477229364813596225
    real    0m38.424s
    user    0m38.313s
    sys     0m0.109s
    4856539211500699706378405156269
    real    0m38.442s
    user    0m38.359s
    sys     0m0.063s
All I did was to put a std::unordered_map in there to memoize previous fibo results:

Code: Select all

    std::unordered_map<uint64_t, bint> memo;

    bint fibo(int n) {
        if (memo.find(n) != memo.end()) {
            return memo[n];    
        }
        int k = (n / 2);
        bint fk = fibo(k);
        bint fk1 = fibo(k + 1);
        if (isEven(n)) {
            bint result = fk * (fk1 * two - fk); 
            memo[n] = result;
            return result;
        }
        bint result = (fk * fk) + (fk1 * fk1);
        memo[n] = result;
        return result; 
    }
Still looks like the Python and JS versions. Sweet.
Memory in C++ is a leaky abstraction .

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 12:30 pm

PeterO wrote:
Fri Dec 14, 2018 9:43 am
jahboater wrote:
Fri Dec 14, 2018 9:29 am
The whole discussion about operating systems is moot - this is new code, no one in their right mind is going to write new code using an old deprecated operating system that may or may not be supported, especially when its trivial to use the current operating systems instead. That's bad engineering. Built-in obsolescence.
Fixed for you.....
PeterO
:) . Nah I will continue to use n*x and systems inspired from n*x (I am typing this on Linux, an n*x), so just because an old system was deprecated does not mean I will stop writing code for it. I also use Atari MiNT on TOS another n*x system.

Though I am using EABI for the system calls, not because of the above discussion so much as because it looks like the 2 system calls are the same on BSD in EABI.

PeterO wrote:
Fri Dec 14, 2018 10:13 am
Heater wrote:
Fri Dec 14, 2018 10:06 am
Actually...This thread was about BASIC. So:
But have you ever noticed how DavidS manages to turn every thread into a moan about how no one supports RiscOS ?
Interesting statement, especially as I have made many statements in this very thread about the positives of Linux.

I do not use RiscOS, as RiscOS runs on MIPS based systems, and I do not have a MIPS based system. Though I do run RISC OS, which runs on ARM based systems. :)

And I do not moan about people NOT supporting RISC OS, as RISC OS is very well supported, just not much mentioned on these forums for the last 4 years. I do often mention the features of RISC OS from my view on these forums so more people are aware it exists and is a modern OS.
Hey, weren't you the guy preparing ALGOL 60 to run on whatever an Elliott 803 has for an operating system? :)
No OS on that machine, and I have special dispensation for writing code in Algol-60 as most of it gets used to demonstrate the real hardware in a museum setting.

PeterO
Writing code for running on the bare HW of a machine with an architecture that is foreign to modern programmers. That seems very much like a good time.

And using an old version of ALGOL to boot, all the better. It seems that ALGOL influenced just about every high level programming language we use today.
Last edited by DavidS on Fri Dec 14, 2018 12:38 pm, edited 1 time in total.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 12:36 pm

@Heater:
Nice results on your optimization. I will not pretend to understand that, it is a feature of C++ that I am not familiar with, though I have not really played in C++ since the early 1990's.

Glad to see that you are making good progress on optimization, while maintaining correctness of implementation. That is always a positive thing to see.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 1:16 pm

DavidS,

Thanks.
I will not pretend to understand that, it is a feature of C++
It's not so mysterious. The C++ standard library's unordered_map is just a good old hash table. I'm sure you have made hash tables before.
https://en.wikipedia.org/wiki/Hash_table

What confuses things in the C++ standard library is calling it "unordered_map" and all that surrounding junk "std::" ... "<uint64_t, bint>"

I was going to write my own hash table but broke down and dropped that in. Clearly it's not readable by those not familiar with C++ so it has to go. A custom made hash table would be much more obvious.
... correctness of implementation....
Tricky. When tweaking things for optimization it's easy to put hard to spot bugs in there. After all, the code gets progressively less obvious to follow.

To that end I have a whole bunch tests that I can run all the time See test_karatsuba.cpp. Then it's great to have the address sanitizers that are built into GCC and clang to check for out or range memory accesses and other memory allocation errors. Oh and I have "asserts" in the code to check for some obvious bad input to functions.

So far with all that in place the thing stays in shape:)

Edit: Actually...I have no idea if the C++ std::unordered_map is a hash function under the hood. For the current purpose it is an associative array no matter what the actual algorithm is. Nor do I know if it's an efficient associative array in this application.

Nor how well it performs in this application.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5119
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 5:18 pm

Heater wrote:
Fri Dec 14, 2018 11:43 am
PeterO,

You are certainly especially dispended. In fact you get my vote for presenting the most interesting response to this challenge.

Sadly I still can't get it to run with the ALGOL 60 to C translator I have here. It translates OK but the resulting executable segfaults. A problem for another day.

Meanwhile...

My C++ effort is down to 38 seconds. 55% faster that my first working version.

Code: Select all

    $ g++ -Wall -O3  -o fibo_karatsuba fibo_karatsuba.cpp
    $ time ./fibo_karatsuba | head -c 32; time ./fibo_karatsuba | tail  -c 32;
    10727395641800477229364813596225
    real    0m38.424s
    user    0m38.313s
    sys     0m0.109s
    4856539211500699706378405156269
    real    0m38.442s
    user    0m38.359s
    sys     0m0.063s
All I did was to put a std::unordered_map in there to memoize previous fibo results:

Code: Select all

    std::unordered_map<uint64_t, bint> memo;

    bint fibo(int n) {
        if (memo.find(n) != memo.end()) {
            return memo[n];    
        }
        int k = (n / 2);
        bint fk = fibo(k);
        bint fk1 = fibo(k + 1);
        if (isEven(n)) {
            bint result = fk * (fk1 * two - fk); 
            memo[n] = result;
            return result;
        }
        bint result = (fk * fk) + (fk1 * fk1);
        memo[n] = result;
        return result; 
    }
Still looks like the Python and JS versions. Sweet.
The code likes nice and a runtime of 38 seconds is small enough that the program could be run by students during class in a computer lab. But what if the lab is equipped with Raspberry Pi computers. Then what would the runtime be?

jahboater
Posts: 5632
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 5:34 pm

Here is a simplified version of the C code.
The only real optimization is the subbn2 function which was two passes with the carry bits stored in a large array.
I changed it to one pass with the carry stored in a local integer. Only make about 1/10 sec difference though :(
Hopefully its a bit easier to read.

Code: Select all

/*
 *  fibonacci.c -- Compute the nth Fibonacci Number
 *  Written December 1, 2018 by Eric Olson
 *
 *  This program demonstrates the expressiveness of C as measured
 *  by explicitly coding the Karatsuba multiplication algorithm for
 *  big-number arithmetic and then using the doubling formulas
 *
 *    F(2k) = F(k)[2F(k+1)-F(k)]
 *    F(2k+1) = F(k+1)^2+F(k)^2
 *    F(2k+2) = F(k+1)[2F(k)+F(k+1)]
 *
 *  to compute the nth Fibonacci number.
 *
 *  Version 2:  Minor changes to fix compiler warnings and zero the
 *  unused memory in the copybn routine.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>

#define N 4784969

typedef uint64_t ull;

#define base 1000000000ULL

#define min(a,b) (a < b ? a : b)
#define max(a,b) (a > b ? a : b)

/*
 *  Bignum limbs needed for one million decimal digits.
 */
#define BNSIZE 111115

typedef struct
{
  ull *d;
  int n;
} bignum;

static ull *bp;

static ull *
fmalloc( const int len )
{
  ull * const r = bp;
  bp += len;
  return r;
}

static void
ffree( const bignum a )
{
  bp = a.d;
}

static bignum
trim_bn( bignum x )
{
  if( x.n == 0 )
    return x;
  int i;
  for( i = x.n - 1; i >= 0; --i )
    if( x.d[i] != 0 )
      break;
  x.n = i + 1;
  return x;
}

static bignum
new_bn( const int digits )
{
  bignum x;
  x.d = fmalloc( digits );
  memset( x.d, 0, (size_t)digits * sizeof(ull) );
  x.n = 0;
  return x;
}

static bignum
set_bn( const ull value )
{
  bignum x = new_bn(BNSIZE);
  x.d[0] = value;
  x.n = 1;
  return x;
}

static void
print_bn( const bignum x )
{
  printf( "%" PRIu64, x.d[x.n - 1] );
  for( int i = x.n - 2; i >= 0; --i )
    printf( "%09" PRIu64, x.d[i] );
  printf("\n");
}

static void
copy_bn( bignum * const a, bignum b )
{
  b = trim_bn(b);
  if( a->n > b.n )
    memset( a->d + b.n, 0, (size_t)(a->n - b.n) * sizeof(ull) );
  memcpy( a->d, b.d, (size_t)(a->n = b.n) * sizeof(ull) );
}

static bignum
add_bn( const bignum a, const bignum b )
{
  const int n = max( a.n, b.n );
  bignum x = new_bn( n + 1 );
  x.n = n + 1;
  ull c = 0;
  for( int i = 0; i < n; ++i )
  {
    x.d[i] = c;
    if( i < a.n )
      c += a.d[i];
    if( i < b.n )
      c += b.d[i];
    if( c >= base )
    {
      x.d[i] = c - base;
      c = 1;
    }
    else
    {
      x.d[i] = c;
      c = 0;
    }
  }
  x.d[n] = c;
  return x;
}

static void
sub_bn( bignum a, const bignum b )
{
  if( b.n == 0 )
    return;
  uint32_t c = 0;
  for( int i = 0; i < b.n; ++i )
  {
    const ull sum = b.d[i] + c;
    c = 0;
    if( a.d[i] < sum )
    {
      a.d[i] += base;
      c = 1;
    }
    a.d[i] -= sum;
  }
  a.d[b.n] -= c;
}

static bignum
mul_bn( const bignum a, const bignum b )
{
  if( a.n == 0 || b.n == 0 )
    return new_bn(1);
  bignum x = new_bn( a.n + b.n + 1 );
  x.n = a.n + b.n - 1;
  for( int i = 0; i < a.n; ++i )
  {
    for( int j = 0; j < b.n; ++j )
      x.d[i+j] += a.d[i] * b.d[j];
    if( (a.n - i) % 50 == 1 )
      for( int k = 0; k <= x.n; ++k )
        if( x.d[k] >= base )
        {
          const ull c = x.d[k] / base;
          x.d[k] %= base;
          x.d[k+1] += c;
        }
  }
  if( x.d[x.n] )
    ++x.n;
  return x;
}

static bignum
mul_karatsuba( const bignum a, const bignum b )
{
  if( a.n == 0 || b.n == 0 )
    return new_bn(1);

  if( min( a.n, b.n ) < 49 )
    return mul_bn( a, b );

  int i, k;
  bignum a0, a1, b0, b1;
  bignum z2, z1a, z1b, z1, z0;
  bignum x = new_bn( a.n + b.n + 1 );

  ull c;
  const int n = max( a.n, b.n ) / 2;
  a0.d = a.d;
  a0.n = min( n, a.n );
  b0.d = b.d;
  b0.n = min( n, b.n );
  a1.d = a.d + n;
  a1.n = max( 0, a.n - n );
  b1.d = b.d + n;
  b1.n = max( 0, b.n - n );
  z0 = mul_karatsuba( a0, b0 );
  z2 = mul_karatsuba( a1, b1 );
  z1a = add_bn( a1, a0 );
  z1b = add_bn( b1, b0 );
  z1 = mul_karatsuba( z1a, z1b );
  sub_bn( z1, z0 );
  sub_bn( z1, z2 );
  z1 = trim_bn(z1);
  memcpy( x.d, z0.d, (size_t)(x.n = z0.n) * sizeof(ull) );
  if( z1.n != 0 )
  {
    for( i = 0, c = 0; i < z1.n; ++i )
    {
      x.d[k = n + i] += z1.d[i] + c;
      c = 0;
      if( x.d[k] >= base )
      {
        x.d[k] -= base;
        c = 1;
      }
    }
    for( k = n + i; c != 0; ++k )
    {
      x.d[k] += c;
      c = 0;
      if( x.d[k] >= base )
      {
        x.d[k] -= base;
        c = 1;
      }
    }
    if( x.n < k )
      x.n = k;
  }
  if( z2.n != 0 )
  {
    const int n2 = n * 2;
    for( i = 0, c = 0; i < z2.n; ++i )
    {
      x.d[k = n2 + i] += z2.d[i] + c;
      c = 0;
      if( x.d[k] >= base )
      {
        x.d[k] -= base;
        c = 1;
      }
    }
    for( k = n2 + i; c != 0; ++k )
    {
      x.d[k] += c;
      c = 0;
      if( x.d[k] >= base )
      {
        x.d[k] -= base;
        c = 1;
      }
    }
    if( x.n < k )
      x.n = k;
  }
  ffree(z0);
  return x;
}

static int
fibo( const int n, bignum * const a, bignum * const b )
{
  if( n == 0 )
  {
    *a = set_bn(0);
    *b = set_bn(1);
    return 0;
  }
  const int m = fibo( n / 2, a, b ) * 2;
  bignum ta = *a, tb = *b;
  bignum taa = mul_karatsuba( ta, ta );
  bignum tbb = mul_karatsuba( tb, tb );
  bignum taapbb = add_bn( taa, tbb );
  if( n & 1 )
  {
      // [a,b] = [a*a+b*b,b*(2*a+b)]
    bignum t2a = add_bn( ta, ta );
    bignum t2apb = add_bn( t2a, tb );
    bignum tbL2apbR = mul_karatsuba( tb, t2apb );
    copy_bn( a, taapbb );
    copy_bn( b, tbL2apbR );
    ffree(t2a);
  }
  else
  {
      // [a,b] = [a*(b*2-a),a*a+b*b]
    bignum t2bma = add_bn( tb, tb );
    sub_bn( t2bma, ta );
    bignum taL2bmaR = mul_karatsuba( ta, t2bma );
    copy_bn( a, taL2bmaR );
    copy_bn( b, taapbb );
    ffree(t2bma);
  }
  ffree(taa);
  return m;
}

int
main( void )
{
  bignum a, b;
  if( (bp = malloc( BNSIZE * 11 * sizeof(ull) )) == NULL )
  {
    fprintf(stderr,"Out of memory!\n");
    return EXIT_FAILURE;
  }
  fibo( N - 1, &a, &b );
  print_bn(b);
}

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 8:36 pm

ejolson,
The code likes nice ...
Thanks.
...and a runtime of 38 seconds is small enough that the program could be run by students during class in a computer lab. But what if the lab is equipped with Raspberry Pi computers. Then what would the runtime be?
You prompted me to dig out a Pi 3 running the 64 bit Debian, pi64, and find out. Here we go:

Code: Select all

pi@pi64-sense-hat:~/fibo_4784969/c++$ g++ -Wall -O3 -o fibo_karatsuba fibo_karatsuba.cpp
pi@pi64-sense-hat:~/fibo_4784969/c++$ time ./fibo_karatsuba | head -c 32; time ./fibo_karatsuba | tail  -c 32;
10727395641800477229364813596225
real    6m49.579s
user    6m49.485s
sys     0m0.076s
4856539211500699706378405156269

real    6m42.844s
user    6m42.754s
sys     0m0.075s
A bit tedious for a class room. The c fibonacci does much better:

Code: Select all

pi@pi64-sense-hat:~/fibo_4784969/c$ gcc -O3 --fast-math -o fibonacci fibonacci.c
pi@pi64-sense-hat:~/fibo_4784969/c$ time ./fibonacci | head -c 32;
10727395641800477229364813596225
real    0m9.267s
user    0m9.244s
sys     0m0.029s
pi@pi64-sense-hat:~/fibo_4784969/c$ time ./fibonacci | head -c 32; time ./fibonacci | tail -c 32
10727395641800477229364813596225
real    0m9.248s
user    0m9.216s
sys     0m0.035s
4856539211500699706378405156269

real    0m9.146s
user    0m9.126s
sys     0m0.023s
Now, I'm no teacher but I imagine that if the topic was the mathematical niceties of the fast fibo algorithm or karatsuba I would want to have the students try it out in a language that quick and easy to write. Short and sweet. Where the algorithm is clear to see in the code and does not require writing 100's of lines of "line noise".

Python for example:
$ time python fibo_4784969.py | head -c 32; time python fibo_4784969.py | tail -c 32
10727395641800477229364813596225Traceback (most recent call last):
File "fibo_4784969.py", line 25, in <module>
print(fibo(4784969))
IOError: [Errno 32] Broken pipe

real 1m34.544s
user 1m34.467s
sys 0m0.051s
156269
Run time = 5.16283297539

real 1m34.306s
user 1m34.235s
sys 0m0.043s
I have no idea why Python has that "Broken pipe" problem there.

Or perhaps Javascript:

Code: Select all

$ time node  fibo.js
42647ms
...
539211500699706378405156269

real    3m24.660s
user    3m24.552s
sys     0m0.095s
Anecdote: Ten years ago I was showing my 13 year old son what could be done in Python. We kept multiplying numbers til they were huge, then multiplying them til they were huger, and so on. He was well impressed and said "I have to tell my teacher about this".
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5119
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 8:42 pm

jahboater wrote:
Fri Dec 14, 2018 5:34 pm
Here is a simplified version of the C code.
Thanks for the update. Feel free to add your name to the source along with a description of your modifications.

I'm surprised you removed the equation which determines how much memory will be needed as a function of n. Was it not working, or did you just remove it?

Maybe the max and min definitions should be renamed to something like jbmax and jbmin so they don't conflict with macros of the same names that might appear elsewhere.

I'm not sure the indenting was preserved correctly by the website. You might consider converting all tabs to spaces before posting the code to the forum.

The new subtraction routine looks like an improvement. It is possible that Pentium III CPUs ran the two-pass bnsub2 version faster, especially when compiled with older versions of gcc. The presence of the two suffix makes me think there might originally have been a different subtraction routine, perhaps closer to what you have written.

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 8:58 pm

Here is an interesting observation:

On the Pi the C fibonacci runs in 9.3 seconds. The C++ fibonacci runs in 409 seconds.
C is 43 times faster.

On my Win 10 PC using the Linux Subsystem for Windows the the C fibonacci runs in 1.7 seconds. The C++ fibonacci runs in 45 seconds.
C is only 25 times faster.

Curious.

Yes, my C++ got a bit slower recently. I undid an optimization that was a bit hacky and got in the way of what I want to do next.

I may never catch up with ejolson's fibonacci but I think we can get a lot closer.
Last edited by Heater on Fri Dec 14, 2018 9:13 pm, edited 1 time in total.
Memory in C++ is a leaky abstraction .

jahboater
Posts: 5632
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 9:02 pm

ejolson wrote:
Fri Dec 14, 2018 8:42 pm
I'm surprised you removed the equation which determines how much memory will be needed as a function of n. Was it not working, or did you just remove it?
The equation was fine. I should have said, but I tried to simplify things by calculating for the one single value of N. So I just left the simple constant with a comment. I can put it back if you feel coding for the general case (arbitrary N) is important.
ejolson wrote:
Fri Dec 14, 2018 8:42 pm
Maybe the max and min definitions should be renamed to something like jbmax and jbmin so they don't conflict with macros of the same names that might appear elsewhere.
A good point, but I have used these names for years with never a problem. I think the code would not read as well if I changed the names. fmin and fmax are the only library names that come close.
ejolson wrote:
Fri Dec 14, 2018 8:42 pm
I'm not sure the indenting was preserved correctly by the website. You might consider converting all tabs to spaces before posting the code to the forum.
That's exactly what I did. As the original author, please let me know what indent you prefer and I'll adjust it.
ejolson wrote:
Fri Dec 14, 2018 8:42 pm
The new subtraction routine looks like an improvement. It is possible that Pentium III CPUs ran the two-pass bnsub2 version faster, especially when compiled with older versions of gcc. The presence of the two suffix makes me think there might originally have been a different subtraction routine, perhaps closer to what you have written.
Aha! I thought the "2" meant two passes. I noticed bnsub2 allocated 8 times as much memory as it actually used - which made me look into it. The other functions store the carry in a simple variable, so it is now more consistent. I was able to reduce the MFACT thing from 12 to 11 which saved some memory.

MFACT should revert to having a name, but I'm hard pressed to think of a decent comment description to go with it.

ejolson
Posts: 5119
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 10:18 pm

Heater wrote:
Fri Dec 14, 2018 8:36 pm
Now, I'm no teacher but I imagine that if the topic was the mathematical niceties of the fast fibo algorithm or karatsuba I would want to have the students try it out in a language that quick and easy to write. Short and sweet. Where the algorithm is clear to see in the code and does not require writing 100's of lines of "line noise".
The examples presented so far have exhibited an unfortunate tradeoff between code clarity and efficiency.

If the goal is to prepare students to work in high-performance computing, practice writing relatively short programs which solve simple problems efficiently in C, Fortran and assembler helps smooth out the learning curve before working on large projects. Exascale and big data refer, in part, to the simple fact that the problem size is now in the asymptotic regime of algorithms that were not previously considered practical. An example in terms of the Fibonacci computation we've been enjoying is how big does n have to be before the Strassen algorithm becomes useful? While such a question is theoretically interesting, it becomes practically important once execution speeds are fast enough that suitably large values of n are common.

Knowing the theory behind modern farming is interesting, but liberation from subsistence living only happens when the tools necessary to implement modern methods are available. If you'll pardon me for quoting what I said in an earlier post,
being able to implement personal algorithms that make efficient use of computer hardware also liberates an individual from the constraints of existing subroutine libraries and built-in functionality
Even though Karatsuba multiplication is asymptotically superior to the O(n^2) algorithm no matter what programming language is used, if for all problem sizes that finish in 3 minutes, the demonstration code can easily be beaten by the slow algorithm, then it does not make for a very good presentation. Moreover, since the slow algorithm looks simple even when coded in C or assembler, the recursive divide and conquer method has to beat a reasonably well optimized O(n^2) code to avoid becoming an illustration of divide and fail instead.
Last edited by ejolson on Fri Dec 14, 2018 10:50 pm, edited 1 time in total.

ejolson
Posts: 5119
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 10:32 pm

jahboater wrote:
Fri Dec 14, 2018 9:02 pm
I was able to reduce the MFACT thing from 12 to 11 which saved some memory.
Agreed, I think a simple counting of how deep the temporary memory allocations go comes up with the number 11. I further made it 12 just to be safe. Adjusting the allocated memory as a function of n allows computing larger Fibonacci numbers without having to change anything but n in the code. This is good in my opinion.

As for indentation style, I prefer the K&R style with 4-space tabs and open and close braces on the same lines as the control structures. I don't like formatting the block structure of C as if it were Pascal, Basic, Fortran or Algol because the braces look lonely all by themselves. Also, I typically use such a large font that adding additional white space just for the braces impairs legibility more than it helps.

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 11:17 pm

I have take a liking to the clang formatting tool. It will read your source and spit it out reformatted according to a number of famous project coding standards (or you could define your own if you like).

All my c/C++ code now goes through clang-format:

$ clang-format-4.0 -style="{BasedOnStyle: llvm, IndentWidth: 4}" myCfile.c

That is the LLVM project style but I like a 4 space indent.

Result looks like this: https://github.com/ZiCog/fibo_4784969/b ... %2B/bint.h
Memory in C++ is a leaky abstraction .

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Fri Dec 14, 2018 11:39 pm

ejolson,
The examples presented so far have exhibited an unfortunate trade off between code clarity and efficiency.
Indeed they have.

My observation is that this phenomena is not just unfortunate here but true in general. Getting things done quickly is likely to involve using techniques that are not obvious to those that don't know the theory behind it.

A very simple example might be the simple problem of summing a sequence of integers. The naive will start adding them up one by one, result = 1 + 2 + 3 + 4 ... N. They might write a loop to do that if they are learning programming. Someone with a little maths knowledge will take a short cut, result = N * (first + last) / 2. BOOM done.

A rather more complex example (there is a joke there, wait for it) is that of the Fourier Transform. Anyone who has learned about Fourier in school could probably write a discreet Fourier transform. It will of course be a slow as hell for large data sets. It took Cooley and Tukey to figure out a much faster way to do it in 1965 with the Fast Fourier Transform.

When I saw my first FFT code, in BASIC, I had no idea what it was doing, it seemed totally crazy what with that bit reversal thing going on and such. It was two or three decades before I found out how/why the FFT worked and could write one for myself.

All of this is before we talk about how to write any of these algorithms efficiently in any particular language. You can spend as much time as you like micro-optimizing the wrong algorithm with your hand crafted assembler or whatever, but you will never beat the smart algorithm that is orders of magnitude faster and written in Python!

And, as I think you are saying, the best algorithm for the job may well depend on the size of the data set. Or it may depend on the typical data you have in your application rather than some assumed random input.

When I was getting my head around the FFT I wrote a recursive solution first. I used that to develop my typical three nested loop iterative solution, which was much faster. Except I was amazed to find that when my data set exceed 4 million or so that recursive solution started to be the faster!

Then there is supercomputers and massively parallel systems. A strange case there is the problem of calculating the digits of PI. The current record number of digits calculated is 22,459,157,718,361. Calculated in 100 days or so.

You might think the worlds top supercomputers could tackle this during their lunch break. Turns out they can't. It was done on a souped up PC (4 x Xeon E7-8890 v3 @ 2.50 GHz (72 cores, 144 threads) 1.25 TB DDR4 RAM + 20 x 6 TB disk)

This optimization thing is complicated.

Meanwhile, I'll get back to my C++ thing....
Last edited by Heater on Sat Dec 15, 2018 8:56 am, edited 1 time in total.
Memory in C++ is a leaky abstraction .

jahboater
Posts: 5632
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 1:06 am

ejolson wrote:
Fri Dec 14, 2018 10:32 pm
Adjusting the allocated memory as a function of n allows computing larger Fibonacci numbers without having to change anything but n in the code.
Altered so you may just change the N manifest at the start (I agree this is better).

4 spaces indentation, moved open curlies back etc.

Code: Select all

/*
 *  fibonacci.c -- Compute the nth Fibonacci Number
 *  Written December 1, 2018 by Eric Olson
 *
 *  This program demonstrates the expressiveness of C as measured
 *  by explicitly coding the Karatsuba multiplication algorithm for
 *  big-number arithmetic and then using the doubling formulas
 *
 *      F(2k) = F(k)[2F(k+1)-F(k)]
 *      F(2k+1) = F(k+1)^2+F(k)^2
 *      F(2k+2) = F(k+1)[2F(k)+F(k+1)]
 *
 *  to compute the nth Fibonacci number.
 *
 *  Version 2:  Minor changes to fix compiler warnings and zero the
 *  unused memory in the copybn routine.
 *
 *  Version 3:  Jeremy Hall - December 15, 2018
 *    Simplified the 0 and 1 case at the start of fibo() (removed atobn and atolln)
 *    Changed subbn2 to single pass without allocating an array for carries
 *    Replaced some of the simple logic with max() and min() expressions.
 *    Simplified the print function.
 *    Declared immutable items as const.
 *    Removed multiple call sequences of delbn() (earler calls were ignored).
 *    Used exact width integer types.
 *    More minor changes to fix compiler warnings.
 *    Removed extra jmp from carry settings in Karatsuba multiply.
 *    Changed some function names for slightly better readability.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>
#include <math.h>

#define N 4784969

typedef uint64_t qword;

#define base 1000000000ULL

#define ALLOC_DEPTH 11

#define PHI (1+sqrt(5.0))/2

const int bn_size = N * (int)(log10(PHI)/9+4);

#define min(a,b) (a < b ? a : b)
#define max(a,b) (a > b ? a : b)

typedef struct {
    qword *d;
    int n;
} bignum;

static qword *bp;

static qword *
fmalloc( const int len ) {
    qword * const r = bp;
    bp += len;
    return r;
}

static void
ffree( const bignum a ) {
    bp = a.d;
}

static bignum
trim_bn( bignum x ) {
    if( x.n == 0 )
        return x;
    int i;
    for( i = x.n - 1; i >= 0; --i )
        if( x.d[i] != 0 )
            break;
    x.n = i + 1;
    return x;
}

static bignum
new_bn( const int digits ) {
    bignum x;
    x.d = fmalloc( digits );
    memset( x.d, 0, (size_t)digits * sizeof(qword) );
    x.n = 0;
    return x;
}

static bignum
set_bn( const qword value ) {
    bignum x = new_bn(bn_size);
    x.d[0] = value;
    x.n = 1;
    return x;
}

static void
print_bn( const bignum x ) {
    printf( "%" PRIu64, x.d[x.n - 1] );
    for( int i = x.n - 2; i >= 0; --i )
        printf( "%09" PRIu64, x.d[i] );
    printf("\n");
}

static void
copy_bn( bignum * const a, bignum b ) {
    b = trim_bn(b);
    if( a->n > b.n )
        memset( a->d + b.n, 0, (size_t)(a->n - b.n) * sizeof(qword) );
    memcpy( a->d, b.d, (size_t)(a->n = b.n) * sizeof(qword) );
}

static bignum
add_bn( const bignum a, const bignum b ) {
    const int n = max( a.n, b.n );
    bignum x = new_bn( n + 1 );
    x.n = n + 1;
    qword c = 0;
    for( int i = 0; i < n; ++i ) {
        x.d[i] = c;
        if( i < a.n )
            c += a.d[i];
        if( i < b.n )
            c += b.d[i];
        if( c >= base ) {
            x.d[i] = c - base;
            c = 1;
        } else {
            x.d[i] = c;
            c = 0;
        }
    }
    x.d[n] = c;
    return x;
}

static void
sub_bn( bignum a, const bignum b ) {
    if( b.n == 0 )
        return;
    uint32_t c = 0;
    for( int i = 0; i < b.n; ++i ) {
        const qword sum = b.d[i] + c;
        c = 0;
        if( a.d[i] < sum ) {
            a.d[i] += base;
            c = 1;
        }
        a.d[i] -= sum;
    }
    a.d[b.n] -= c;
}

static bignum
mul_bn( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);
    bignum x = new_bn( a.n + b.n + 1 );
    x.n = a.n + b.n - 1;
    for( int i = 0; i < a.n; ++i ) {
        for( int j = 0; j < b.n; ++j )
            x.d[i+j] += a.d[i] * b.d[j];
        if( (a.n - i) % 50 == 1 )
            for( int k = 0; k <= x.n; ++k )
                if( x.d[k] >= base ) {
                    const qword c = x.d[k] / base;
                    x.d[k] %= base;
                    x.d[k+1] += c;
                }
    }
    if( x.d[x.n] )
        ++x.n;
    return x;
}

static bignum
mul_karatsuba( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);

    if( min( a.n, b.n ) < 49 )
        return mul_bn( a, b );

    int i, k;
    bignum a0, a1, b0, b1;
    bignum z2, z1a, z1b, z1, z0;
    bignum x = new_bn( a.n + b.n + 1 );

    qword c;
    const int n = max( a.n, b.n ) / 2;
    a0.d = a.d;
    a0.n = min( n, a.n );
    b0.d = b.d;
    b0.n = min( n, b.n );
    a1.d = a.d + n;
    a1.n = max( 0, a.n - n );
    b1.d = b.d + n;
    b1.n = max( 0, b.n - n );
    z0 = mul_karatsuba( a0, b0 );
    z2 = mul_karatsuba( a1, b1 );
    z1a = add_bn( a1, a0 );
    z1b = add_bn( b1, b0 );
    z1 = mul_karatsuba( z1a, z1b );
    sub_bn( z1, z0 );
    sub_bn( z1, z2 );
    z1 = trim_bn(z1);
    memcpy( x.d, z0.d, (size_t)(x.n = z0.n) * sizeof(qword) );
    if( z1.n != 0 ) {
        for( i = 0, c = 0; i < z1.n; ++i ) {
            x.d[k = n + i] += z1.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    if( z2.n != 0 ) {
        const int n2 = n * 2;
        for( i = 0, c = 0; i < z2.n; ++i ) {
            x.d[k = n2 + i] += z2.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n2 + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    ffree(z0);
    return x;
}

static void
fibo( const int n, bignum * const a, bignum * const b ) {
    if( n == 0 ) {
        *a = set_bn(0);
        *b = set_bn(1);
        return;
    }
    fibo( n / 2, a, b );
    bignum ta = *a, tb = *b;
    bignum taa = mul_karatsuba( ta, ta );
    bignum tbb = mul_karatsuba( tb, tb );
    bignum taapbb = add_bn( taa, tbb );
    if( n & 1 ) {
            // [a,b] = [a*a+b*b,b*(2*a+b)]
        bignum t2a = add_bn( ta, ta );
        bignum t2apb = add_bn( t2a, tb );
        bignum tbL2apbR = mul_karatsuba( tb, t2apb );
        copy_bn( a, taapbb );
        copy_bn( b, tbL2apbR );
        ffree(t2a);
    } else {
            // [a,b] = [a*(b*2-a),a*a+b*b]
        bignum t2bma = add_bn( tb, tb );
        sub_bn( t2bma, ta );
        bignum taL2bmaR = mul_karatsuba( ta, t2bma );
        copy_bn( a, taL2bmaR );
        copy_bn( b, taapbb );
        ffree(t2bma);
    }
    ffree(taa);
}

int
main( void ) {
    if( (bp = malloc( (size_t)bn_size * ALLOC_DEPTH * sizeof(qword) )) == NULL ) {
        fprintf(stderr,"Out of memory!\n");
        return EXIT_FAILURE;
    }
    if( N < 2 ) {
        printf( "%d\n", N );
        return EXIT_SUCCESS;
    }
    bignum a, b;
    fibo( N - 1, &a, &b );
    print_bn(b);
    return EXIT_SUCCESS;
}
Last edited by jahboater on Sat Dec 15, 2018 2:44 am, edited 2 times in total.

ejolson
Posts: 5119
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 1:34 am

jahboater wrote:
Sat Dec 15, 2018 1:06 am
ejolson wrote:
Fri Dec 14, 2018 10:32 pm
Adjusting the allocated memory as a function of n allows computing larger Fibonacci numbers without having to change anything but n in the code.
Altered so you may just change the N manifest at the start (I agree this is better).

4 spaces indentation, moved open curlies back etc.

Code: Select all

/*
 *  fibonacci.c -- Compute the nth Fibonacci Number
 *  Written December 1, 2018 by Eric Olson
 *
 *  This program demonstrates the expressiveness of C as measured
 *  by explicitly coding the Karatsuba multiplication algorithm for
 *  big-number arithmetic and then using the doubling formulas
 *
 *      F(2k) = F(k)[2F(k+1)-F(k)]
 *      F(2k+1) = F(k+1)^2+F(k)^2
 *      F(2k+2) = F(k+1)[2F(k)+F(k+1)]
 *
 *  to compute the nth Fibonacci number.
 *
 *  Version 2:  Minor changes to fix compiler warnings and zero the
 *  unused memory in the copybn routine.
 *
 *  Version 3:  Jeremy Hall - December 15, 2018
 *    Simplified the 0 and 1 case at the start of fibo() (removed atobn and atolln)
 *    Changed subbn2 to single pass without allocating an array for carries
 *    Replaced some of the simple logic with max() and min() expressions.
 *    Simplified the print function.
 *    Declared immutable items as const.
 *    Removed the "+n%2" from the "m = 2 * fibo ..." expression.
 *    Removed multiple call sequences of delbn() (earler calls were ignored).
 *    Used exact width integer types.
 *    More minor changes to fix compiler warnings.
 *    Removed extra jmp from carry settings in Karatsuba multiply.
 *    Changed some function names for slightly better readability.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>
#include <math.h>

#define N 4784969

typedef uint64_t qword;

#define base 1000000000ULL

#define ALLOC_DEPTH 11

#define PHI (1+sqrt(5.0))/2

const int bn_size = N * (int)(log10(PHI)/9+4);

#define min(a,b) (a < b ? a : b)
#define max(a,b) (a > b ? a : b)

typedef struct {
    qword *d;
    int n;
} bignum;

static qword *bp;

static qword *
fmalloc( const int len ) {
    qword * const r = bp;
    bp += len;
    return r;
}

static void
ffree( const bignum a ) {
    bp = a.d;
}

static bignum
trim_bn( bignum x ) {
    if( x.n == 0 )
        return x;
    int i;
    for( i = x.n - 1; i >= 0; --i )
        if( x.d[i] != 0 )
            break;
    x.n = i + 1;
    return x;
}

static bignum
new_bn( const int digits ) {
    bignum x;
    x.d = fmalloc( digits );
    memset( x.d, 0, (size_t)digits * sizeof(qword) );
    x.n = 0;
    return x;
}

static bignum
set_bn( const qword value ) {
    bignum x = new_bn(bn_size);
    x.d[0] = value;
    x.n = 1;
    return x;
}

static void
print_bn( const bignum x ) {
    printf( "%" PRIu64, x.d[x.n - 1] );
    for( int i = x.n - 2; i >= 0; --i )
        printf( "%09" PRIu64, x.d[i] );
    printf("\n");
}

static void
copy_bn( bignum * const a, bignum b ) {
    b = trim_bn(b);
    if( a->n > b.n )
        memset( a->d + b.n, 0, (size_t)(a->n - b.n) * sizeof(qword) );
    memcpy( a->d, b.d, (size_t)(a->n = b.n) * sizeof(qword) );
}

static bignum
add_bn( const bignum a, const bignum b ) {
    const int n = max( a.n, b.n );
    bignum x = new_bn( n + 1 );
    x.n = n + 1;
    qword c = 0;
    for( int i = 0; i < n; ++i ) {
        x.d[i] = c;
        if( i < a.n )
            c += a.d[i];
        if( i < b.n )
            c += b.d[i];
        if( c >= base ) {
            x.d[i] = c - base;
            c = 1;
        } else {
            x.d[i] = c;
            c = 0;
        }
    }
    x.d[n] = c;
    return x;
}

static void
sub_bn( bignum a, const bignum b ) {
    if( b.n == 0 )
        return;
    uint32_t c = 0;
    for( int i = 0; i < b.n; ++i ) {
        const qword sum = b.d[i] + c;
        c = 0;
        if( a.d[i] < sum ) {
            a.d[i] += base;
            c = 1;
        }
        a.d[i] -= sum;
    }
    a.d[b.n] -= c;
}

static bignum
mul_bn( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);
    bignum x = new_bn( a.n + b.n + 1 );
    x.n = a.n + b.n - 1;
    for( int i = 0; i < a.n; ++i ) {
        for( int j = 0; j < b.n; ++j )
            x.d[i+j] += a.d[i] * b.d[j];
        if( (a.n - i) % 50 == 1 )
            for( int k = 0; k <= x.n; ++k )
                if( x.d[k] >= base ) {
                    const qword c = x.d[k] / base;
                    x.d[k] %= base;
                    x.d[k+1] += c;
                }
    }
    if( x.d[x.n] )
        ++x.n;
    return x;
}

static bignum
mul_karatsuba( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);

    if( min( a.n, b.n ) < 49 )
        return mul_bn( a, b );

    int i, k;
    bignum a0, a1, b0, b1;
    bignum z2, z1a, z1b, z1, z0;
    bignum x = new_bn( a.n + b.n + 1 );

    qword c;
    const int n = max( a.n, b.n ) / 2;
    a0.d = a.d;
    a0.n = min( n, a.n );
    b0.d = b.d;
    b0.n = min( n, b.n );
    a1.d = a.d + n;
    a1.n = max( 0, a.n - n );
    b1.d = b.d + n;
    b1.n = max( 0, b.n - n );
    z0 = mul_karatsuba( a0, b0 );
    z2 = mul_karatsuba( a1, b1 );
    z1a = add_bn( a1, a0 );
    z1b = add_bn( b1, b0 );
    z1 = mul_karatsuba( z1a, z1b );
    sub_bn( z1, z0 );
    sub_bn( z1, z2 );
    z1 = trim_bn(z1);
    memcpy( x.d, z0.d, (size_t)(x.n = z0.n) * sizeof(qword) );
    if( z1.n != 0 ) {
        for( i = 0, c = 0; i < z1.n; ++i ) {
            x.d[k = n + i] += z1.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    if( z2.n != 0 ) {
        const int n2 = n * 2;
        for( i = 0, c = 0; i < z2.n; ++i ) {
            x.d[k = n2 + i] += z2.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n2 + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    ffree(z0);
    return x;
}

static int
fibo( const int n, bignum * const a, bignum * const b ) {
    if( n == 0 ) {
        *a = set_bn( 0 );
        *b = set_bn( 1 );
        return 0;
    }
    const int m = fibo( n / 2, a, b ) * 2;
    bignum ta = *a, tb = *b;
    bignum taa = mul_karatsuba( ta, ta );
    bignum tbb = mul_karatsuba( tb, tb );
    bignum taapbb = add_bn( taa, tbb );
    if( n & 1 ) {
            // [a,b] = [a*a+b*b,b*(2*a+b)]
        bignum t2a = add_bn( ta, ta );
        bignum t2apb = add_bn( t2a, tb );
        bignum tbL2apbR = mul_karatsuba( tb, t2apb );
        copy_bn( a, taapbb );
        copy_bn( b, tbL2apbR );
        ffree(t2a);
    } else {
            // [a,b] = [a*(b*2-a),a*a+b*b]
        bignum t2bma = add_bn( tb, tb );
        sub_bn( t2bma, ta );
        bignum taL2bmaR = mul_karatsuba( ta, t2bma );
        copy_bn( a, taL2bmaR );
        copy_bn( b, taapbb );
        ffree(t2bma);
    }
    ffree(taa);
    return m;
}

int
main( void ) {
    if( (bp = malloc( (size_t)bn_size * ALLOC_DEPTH * sizeof(qword) )) == NULL ) {
        fprintf(stderr,"Out of memory!\n");
        return EXIT_FAILURE;
    }
    if( N < 2 ) {
        printf( "%d\n", N );
        return EXIT_SUCCESS;
    }
    bignum a, b;
    fibo( N - 1, &a, &b );
    print_bn(b);
    return EXIT_SUCCESS;
}
Looks good. If you're going to remove the +n%2 from the return value of fibo you might as well remove m entirely and make the return void from fibo. It was originally debugging code to check that the doubling formulas were used in such a way that everything adds back up to n after the recursion.

jahboater
Posts: 5632
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 2:43 am

Done.

Heater,
Perhaps you could add this C version 3 to the website?

Code: Select all

/*
 *  fibonacci.c -- Compute the nth Fibonacci Number
 *  Written December 1, 2018 by Eric Olson
 *
 *  This program demonstrates the expressiveness of C as measured
 *  by explicitly coding the Karatsuba multiplication algorithm for
 *  big-number arithmetic and then using the doubling formulas
 *
 *      F(2k) = F(k)[2F(k+1)-F(k)]
 *      F(2k+1) = F(k+1)^2+F(k)^2
 *      F(2k+2) = F(k+1)[2F(k)+F(k+1)]
 *
 *  to compute the nth Fibonacci number.
 *
 *  Version 2:  Minor changes to fix compiler warnings and zero the
 *  unused memory in the copybn routine.
 *
 *  Version 3:  Jeremy Hall - December 15, 2018
 *    Simplified the 0 and 1 case at the start of fibo() (removed atobn and atolln).
 *    Changed subbn2 to single pass without allocating an array for carries.
 *    Replaced some of the simple logic with max() and min() expressions.
 *    Simplified the print function.
 *    Declared immutable items as const.
 *    Removed multiple call sequences of delbn() (earler calls were ignored).
 *    Used exact width integer types.
 *    More minor changes to fix compiler warnings.
 *    Removed extra jmp from carry settings in Karatsuba multiply.
 *    Changed some function names for slightly better readability.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>
#include <math.h>

#define N 4784969

typedef uint64_t qword;

#define base 1000000000ULL

#define ALLOC_DEPTH 11

#define PHI (1+sqrt(5.0))/2

const int bn_size = N * (int)(log10(PHI)/9+4);

#define min(a,b) (a < b ? a : b)
#define max(a,b) (a > b ? a : b)

typedef struct {
    qword *d;
    int n;
} bignum;

static qword *bp;

static qword *
fmalloc( const int len ) {
    qword * const r = bp;
    bp += len;
    return r;
}

static void
ffree( const bignum a ) {
    bp = a.d;
}

static bignum
trim_bn( bignum x ) {
    if( x.n == 0 )
        return x;
    int i;
    for( i = x.n - 1; i >= 0; --i )
        if( x.d[i] != 0 )
            break;
    x.n = i + 1;
    return x;
}

static bignum
new_bn( const int digits ) {
    bignum x;
    x.d = fmalloc( digits );
    memset( x.d, 0, (size_t)digits * sizeof(qword) );
    x.n = 0;
    return x;
}

static bignum
set_bn( const qword value ) {
    bignum x = new_bn(bn_size);
    x.d[0] = value;
    x.n = 1;
    return x;
}

static void
print_bn( const bignum x ) {
    printf( "%" PRIu64, x.d[x.n - 1] );
    for( int i = x.n - 2; i >= 0; --i )
        printf( "%09" PRIu64, x.d[i] );
    printf("\n");
}

static void
copy_bn( bignum * const a, bignum b ) {
    b = trim_bn(b);
    if( a->n > b.n )
        memset( a->d + b.n, 0, (size_t)(a->n - b.n) * sizeof(qword) );
    memcpy( a->d, b.d, (size_t)(a->n = b.n) * sizeof(qword) );
}

static bignum
add_bn( const bignum a, const bignum b ) {
    const int n = max( a.n, b.n );
    bignum x = new_bn( n + 1 );
    x.n = n + 1;
    qword c = 0;
    for( int i = 0; i < n; ++i ) {
        x.d[i] = c;
        if( i < a.n )
            c += a.d[i];
        if( i < b.n )
            c += b.d[i];
        if( c >= base ) {
            x.d[i] = c - base;
            c = 1;
        } else {
            x.d[i] = c;
            c = 0;
        }
    }
    x.d[n] = c;
    return x;
}

static void
sub_bn( bignum a, const bignum b ) {
    if( b.n == 0 )
        return;
    uint32_t c = 0;
    for( int i = 0; i < b.n; ++i ) {
        const qword sum = b.d[i] + c;
        c = 0;
        if( a.d[i] < sum ) {
            a.d[i] += base;
            c = 1;
        }
        a.d[i] -= sum;
    }
    a.d[b.n] -= c;
}

static bignum
mul_bn( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);
    bignum x = new_bn( a.n + b.n + 1 );
    x.n = a.n + b.n - 1;
    for( int i = 0; i < a.n; ++i ) {
        for( int j = 0; j < b.n; ++j )
            x.d[i+j] += a.d[i] * b.d[j];
        if( (a.n - i) % 50 == 1 )
            for( int k = 0; k <= x.n; ++k )
                if( x.d[k] >= base ) {
                    const qword c = x.d[k] / base;
                    x.d[k] %= base;
                    x.d[k+1] += c;
                }
    }
    if( x.d[x.n] )
        ++x.n;
    return x;
}

static bignum
mul_karatsuba( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);

    if( min( a.n, b.n ) < 49 )
        return mul_bn( a, b );

    int i, k;
    bignum a0, a1, b0, b1;
    bignum z2, z1a, z1b, z1, z0;
    bignum x = new_bn( a.n + b.n + 1 );

    qword c;
    const int n = max( a.n, b.n ) / 2;
    a0.d = a.d;
    a0.n = min( n, a.n );
    b0.d = b.d;
    b0.n = min( n, b.n );
    a1.d = a.d + n;
    a1.n = max( 0, a.n - n );
    b1.d = b.d + n;
    b1.n = max( 0, b.n - n );
    z0 = mul_karatsuba( a0, b0 );
    z2 = mul_karatsuba( a1, b1 );
    z1a = add_bn( a1, a0 );
    z1b = add_bn( b1, b0 );
    z1 = mul_karatsuba( z1a, z1b );
    sub_bn( z1, z0 );
    sub_bn( z1, z2 );
    z1 = trim_bn(z1);
    memcpy( x.d, z0.d, (size_t)(x.n = z0.n) * sizeof(qword) );
    if( z1.n != 0 ) {
        for( i = 0, c = 0; i < z1.n; ++i ) {
            x.d[k = n + i] += z1.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    if( z2.n != 0 ) {
        const int n2 = n * 2;
        for( i = 0, c = 0; i < z2.n; ++i ) {
            x.d[k = n2 + i] += z2.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n2 + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    ffree(z0);
    return x;
}

static void
fibo( const int n, bignum * const a, bignum * const b ) {
    if( n == 0 ) {
        *a = set_bn(0);
        *b = set_bn(1);
        return;
    }
    fibo( n / 2, a, b );
    bignum ta = *a, tb = *b;
    bignum taa = mul_karatsuba( ta, ta );
    bignum tbb = mul_karatsuba( tb, tb );
    bignum taapbb = add_bn( taa, tbb );
    if( n & 1 ) {
            // [a,b] = [a*a+b*b,b*(2*a+b)]
        bignum t2a = add_bn( ta, ta );
        bignum t2apb = add_bn( t2a, tb );
        bignum tbL2apbR = mul_karatsuba( tb, t2apb );
        copy_bn( a, taapbb );
        copy_bn( b, tbL2apbR );
        ffree(t2a);
    } else {
            // [a,b] = [a*(b*2-a),a*a+b*b]
        bignum t2bma = add_bn( tb, tb );
        sub_bn( t2bma, ta );
        bignum taL2bmaR = mul_karatsuba( ta, t2bma );
        copy_bn( a, taL2bmaR );
        copy_bn( b, taapbb );
        ffree(t2bma);
    }
    ffree(taa);
}

int
main( void ) {
    if( (bp = malloc( (size_t)bn_size * ALLOC_DEPTH * sizeof(qword) )) == NULL ) {
        fprintf(stderr,"Out of memory!\n");
        return EXIT_FAILURE;
    }
    if( N < 2 ) {
        printf( "%d\n", N );
        return EXIT_SUCCESS;
    }
    bignum a, b;
    fibo( N - 1, &a, &b );
    print_bn(b);
    return EXIT_SUCCESS;
}

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 4:13 am

jahboater wrote:
Sat Dec 15, 2018 2:43 am
Done.

Heater,
Perhaps you could add this C version 3 to the website?

Code: Select all

/*
 *  fibonacci.c -- Compute the nth Fibonacci Number
 *  Written December 1, 2018 by Eric Olson
 *
 *  This program demonstrates the expressiveness of C as measured
 *  by explicitly coding the Karatsuba multiplication algorithm for
 *  big-number arithmetic and then using the doubling formulas
 *
 *      F(2k) = F(k)[2F(k+1)-F(k)]
 *      F(2k+1) = F(k+1)^2+F(k)^2
 *      F(2k+2) = F(k+1)[2F(k)+F(k+1)]
 *
 *  to compute the nth Fibonacci number.
 *
 *  Version 2:  Minor changes to fix compiler warnings and zero the
 *  unused memory in the copybn routine.
 *
 *  Version 3:  Jeremy Hall - December 15, 2018
 *    Simplified the 0 and 1 case at the start of fibo() (removed atobn and atolln).
 *    Changed subbn2 to single pass without allocating an array for carries.
 *    Replaced some of the simple logic with max() and min() expressions.
 *    Simplified the print function.
 *    Declared immutable items as const.
 *    Removed multiple call sequences of delbn() (earler calls were ignored).
 *    Used exact width integer types.
 *    More minor changes to fix compiler warnings.
 *    Removed extra jmp from carry settings in Karatsuba multiply.
 *    Changed some function names for slightly better readability.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>
#include <math.h>

#define N 4784969

typedef uint64_t qword;

#define base 1000000000ULL

#define ALLOC_DEPTH 11

#define PHI (1+sqrt(5.0))/2

const int bn_size = N * (int)(log10(PHI)/9+4);

#define min(a,b) (a < b ? a : b)
#define max(a,b) (a > b ? a : b)

typedef struct {
    qword *d;
    int n;
} bignum;

static qword *bp;

static qword *
fmalloc( const int len ) {
    qword * const r = bp;
    bp += len;
    return r;
}

static void
ffree( const bignum a ) {
    bp = a.d;
}

static bignum
trim_bn( bignum x ) {
    if( x.n == 0 )
        return x;
    int i;
    for( i = x.n - 1; i >= 0; --i )
        if( x.d[i] != 0 )
            break;
    x.n = i + 1;
    return x;
}

static bignum
new_bn( const int digits ) {
    bignum x;
    x.d = fmalloc( digits );
    memset( x.d, 0, (size_t)digits * sizeof(qword) );
    x.n = 0;
    return x;
}

static bignum
set_bn( const qword value ) {
    bignum x = new_bn(bn_size);
    x.d[0] = value;
    x.n = 1;
    return x;
}

static void
print_bn( const bignum x ) {
    printf( "%" PRIu64, x.d[x.n - 1] );
    for( int i = x.n - 2; i >= 0; --i )
        printf( "%09" PRIu64, x.d[i] );
    printf("\n");
}

static void
copy_bn( bignum * const a, bignum b ) {
    b = trim_bn(b);
    if( a->n > b.n )
        memset( a->d + b.n, 0, (size_t)(a->n - b.n) * sizeof(qword) );
    memcpy( a->d, b.d, (size_t)(a->n = b.n) * sizeof(qword) );
}

static bignum
add_bn( const bignum a, const bignum b ) {
    const int n = max( a.n, b.n );
    bignum x = new_bn( n + 1 );
    x.n = n + 1;
    qword c = 0;
    for( int i = 0; i < n; ++i ) {
        x.d[i] = c;
        if( i < a.n )
            c += a.d[i];
        if( i < b.n )
            c += b.d[i];
        if( c >= base ) {
            x.d[i] = c - base;
            c = 1;
        } else {
            x.d[i] = c;
            c = 0;
        }
    }
    x.d[n] = c;
    return x;
}

static void
sub_bn( bignum a, const bignum b ) {
    if( b.n == 0 )
        return;
    uint32_t c = 0;
    for( int i = 0; i < b.n; ++i ) {
        const qword sum = b.d[i] + c;
        c = 0;
        if( a.d[i] < sum ) {
            a.d[i] += base;
            c = 1;
        }
        a.d[i] -= sum;
    }
    a.d[b.n] -= c;
}

static bignum
mul_bn( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);
    bignum x = new_bn( a.n + b.n + 1 );
    x.n = a.n + b.n - 1;
    for( int i = 0; i < a.n; ++i ) {
        for( int j = 0; j < b.n; ++j )
            x.d[i+j] += a.d[i] * b.d[j];
        if( (a.n - i) % 50 == 1 )
            for( int k = 0; k <= x.n; ++k )
                if( x.d[k] >= base ) {
                    const qword c = x.d[k] / base;
                    x.d[k] %= base;
                    x.d[k+1] += c;
                }
    }
    if( x.d[x.n] )
        ++x.n;
    return x;
}

static bignum
mul_karatsuba( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);

    if( min( a.n, b.n ) < 49 )
        return mul_bn( a, b );

    int i, k;
    bignum a0, a1, b0, b1;
    bignum z2, z1a, z1b, z1, z0;
    bignum x = new_bn( a.n + b.n + 1 );

    qword c;
    const int n = max( a.n, b.n ) / 2;
    a0.d = a.d;
    a0.n = min( n, a.n );
    b0.d = b.d;
    b0.n = min( n, b.n );
    a1.d = a.d + n;
    a1.n = max( 0, a.n - n );
    b1.d = b.d + n;
    b1.n = max( 0, b.n - n );
    z0 = mul_karatsuba( a0, b0 );
    z2 = mul_karatsuba( a1, b1 );
    z1a = add_bn( a1, a0 );
    z1b = add_bn( b1, b0 );
    z1 = mul_karatsuba( z1a, z1b );
    sub_bn( z1, z0 );
    sub_bn( z1, z2 );
    z1 = trim_bn(z1);
    memcpy( x.d, z0.d, (size_t)(x.n = z0.n) * sizeof(qword) );
    if( z1.n != 0 ) {
        for( i = 0, c = 0; i < z1.n; ++i ) {
            x.d[k = n + i] += z1.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    if( z2.n != 0 ) {
        const int n2 = n * 2;
        for( i = 0, c = 0; i < z2.n; ++i ) {
            x.d[k = n2 + i] += z2.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n2 + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    ffree(z0);
    return x;
}

static void
fibo( const int n, bignum * const a, bignum * const b ) {
    if( n == 0 ) {
        *a = set_bn(0);
        *b = set_bn(1);
        return;
    }
    fibo( n / 2, a, b );
    bignum ta = *a, tb = *b;
    bignum taa = mul_karatsuba( ta, ta );
    bignum tbb = mul_karatsuba( tb, tb );
    bignum taapbb = add_bn( taa, tbb );
    if( n & 1 ) {
            // [a,b] = [a*a+b*b,b*(2*a+b)]
        bignum t2a = add_bn( ta, ta );
        bignum t2apb = add_bn( t2a, tb );
        bignum tbL2apbR = mul_karatsuba( tb, t2apb );
        copy_bn( a, taapbb );
        copy_bn( b, tbL2apbR );
        ffree(t2a);
    } else {
            // [a,b] = [a*(b*2-a),a*a+b*b]
        bignum t2bma = add_bn( tb, tb );
        sub_bn( t2bma, ta );
        bignum taL2bmaR = mul_karatsuba( ta, t2bma );
        copy_bn( a, taL2bmaR );
        copy_bn( b, taapbb );
        ffree(t2bma);
    }
    ffree(taa);
}

int
main( void ) {
    if( (bp = malloc( (size_t)bn_size * ALLOC_DEPTH * sizeof(qword) )) == NULL ) {
        fprintf(stderr,"Out of memory!\n");
        return EXIT_FAILURE;
    }
    if( N < 2 ) {
        printf( "%d\n", N );
        return EXIT_SUCCESS;
    }
    bignum a, b;
    fibo( N - 1, &a, &b );
    print_bn(b);
    return EXIT_SUCCESS;
}
Have you actually attempted to run this version? The result is an out of memory error, compiled on gcc v6.3.0 with the commandline gcc -O3 -Wall -ojbfibo jbfibo.c, run on Raspbian Linux (fairly stock install) with 488MB of available RAM (no swap).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

ejolson
Posts: 5119
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 5:59 am

Heater wrote:
Fri Dec 14, 2018 11:39 pm
You might think the worlds top supper computers could tackle this during their lunch break. Turns out they can't. It was done on a souped up PC (4 x Xeon E7-8890 v3 @ 2.50 GHz (72 cores, 144 threads) 1.25 TB DDR4 RAM + 20 x 6 TB disk)
A souped up PC makes a wonderful node for a supper computer, especially beef and barley or chicken and dumpling. Don't forget that raspberry pie, while not good for a supper, makes for a wonderful dessert.

Let Tn be the time needed to compute the nth Fibonacci number. I wonder how graphs of Tn versus n look for the programs discussed so far.

jahboater
Posts: 5632
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 7:35 am

DavidS wrote:
Sat Dec 15, 2018 4:13 am
The result is an out of memory error, compiled on gcc v6.3.0 with the commandline gcc -O3 -Wall -ojbfibo jbfibo.c, run on Raspbian Linux (fairly stock install) with 488MB of available RAM (no swap).
Thanks David. My bad, I didn't try the last change on a Pi!
It ran fine on my PC - which sadly has 16GB of memory :(
Floating point rounding error in the size calculation.
This version works perfectly on a Pi Zero:

Code: Select all

pi@pi0:~$ gcc -O3 fibo.c -o fibo
pi@pi0:~$ free -h
              total        used        free      shared  buff/cache   available
Mem:           481M         20M        157M         27M        302M        379M
Swap:           99M        512K         99M
pi@pi0:~$ ./fibo | head -c 32 ; echo
10727395641800477229364813596225
pi@pi0:~$ ./fibo | tail -c 32 ; echo
4856539211500699706378405156269

Code: Select all

/*
 *  fibonacci.c -- Compute the nth Fibonacci Number
 *  Written December 1, 2018 by Eric Olson
 *
 *  This program demonstrates the expressiveness of C as measured
 *  by explicitly coding the Karatsuba multiplication algorithm for
 *  big-number arithmetic and then using the doubling formulas
 *
 *      F(2k) = F(k)[2F(k+1)-F(k)]
 *      F(2k+1) = F(k+1)^2+F(k)^2
 *      F(2k+2) = F(k+1)[2F(k)+F(k+1)]
 *
 *  to compute the nth Fibonacci number.
 *
 *  Version 2:  Minor changes to fix compiler warnings and zero the
 *  unused memory in the copybn routine.
 *
 *  Version 3:  Jeremy Hall - December 15, 2018
 *    Simplified the 0 and 1 case at the start of fibo() (removed atobn and atolln).
 *    Changed subbn2 to single pass without allocating an array for carries.
 *    Replaced some of the simple logic with max() and min() expressions.
 *    Simplified the print function.
 *    Declared immutable items as const.
 *    Removed multiple call sequences of delbn() (earler calls were ignored).
 *    Used exact width integer types.
 *    More minor changes to fix compiler warnings.
 *    Removed extra jmp from carry settings in Karatsuba multiply.
 *    Changed some function names for slightly better readability.
 */

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <inttypes.h>
#include <math.h>

#define N 4784969

typedef uint64_t qword;

#define base 1000000000ULL

#define ALLOC_DEPTH 11

#define PHI (1+sqrt(5.0))/2

static const int bn_size = (int)(N * log10(PHI) / 9 + 4);

#define min(a,b) (a < b ? a : b)
#define max(a,b) (a > b ? a : b)

typedef struct {
    qword *d;
    int n;
} bignum;

static qword *bp;

static qword *
fmalloc( const int len ) {
    qword * const r = bp;
    bp += len;
    return r;
}

static void
ffree( const bignum a ) {
    bp = a.d;
}

static bignum
trim_bn( bignum x ) {
    if( x.n == 0 )
        return x;
    int i;
    for( i = x.n - 1; i >= 0; --i )
        if( x.d[i] != 0 )
            break;
    x.n = i + 1;
    return x;
}

static bignum
new_bn( const int digits ) {
    bignum x;
    x.d = fmalloc( digits );
    memset( x.d, 0, (size_t)digits * sizeof(qword) );
    x.n = 0;
    return x;
}

static bignum
set_bn( const qword value ) {
    bignum x = new_bn(bn_size);
    x.d[0] = value;
    x.n = 1;
    return x;
}

static void
print_bn( const bignum x ) {
    printf( "%" PRIu64, x.d[x.n - 1] );
    for( int i = x.n - 2; i >= 0; --i )
        printf( "%09" PRIu64, x.d[i] );
    printf("\n");
}

static void
copy_bn( bignum * const a, bignum b ) {
    b = trim_bn(b);
    if( a->n > b.n )
        memset( a->d + b.n, 0, (size_t)(a->n - b.n) * sizeof(qword) );
    memcpy( a->d, b.d, (size_t)(a->n = b.n) * sizeof(qword) );
}

static bignum
add_bn( const bignum a, const bignum b ) {
    const int n = max( a.n, b.n );
    bignum x = new_bn( n + 1 );
    x.n = n + 1;
    qword c = 0;
    for( int i = 0; i < n; ++i ) {
        x.d[i] = c;
        if( i < a.n )
            c += a.d[i];
        if( i < b.n )
            c += b.d[i];
        if( c >= base ) {
            x.d[i] = c - base;
            c = 1;
        } else {
            x.d[i] = c;
            c = 0;
        }
    }
    x.d[n] = c;
    return x;
}

static void
sub_bn( bignum a, const bignum b ) {
    if( b.n == 0 )
        return;
    uint32_t c = 0;
    for( int i = 0; i < b.n; ++i ) {
        const qword sum = b.d[i] + c;
        c = 0;
        if( a.d[i] < sum ) {
            a.d[i] += base;
            c = 1;
        }
        a.d[i] -= sum;
    }
    a.d[b.n] -= c;
}

static bignum
mul_bn( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);
    bignum x = new_bn( a.n + b.n + 1 );
    x.n = a.n + b.n - 1;
    for( int i = 0; i < a.n; ++i ) {
        for( int j = 0; j < b.n; ++j )
            x.d[i+j] += a.d[i] * b.d[j];
        if( (a.n - i) % 50 == 1 )
            for( int k = 0; k <= x.n; ++k )
                if( x.d[k] >= base ) {
                    const qword c = x.d[k] / base;
                    x.d[k] %= base;
                    x.d[k+1] += c;
                }
    }
    if( x.d[x.n] )
        ++x.n;
    return x;
}

static bignum
mul_karatsuba( const bignum a, const bignum b ) {
    if( a.n == 0 || b.n == 0 )
        return new_bn(1);

    if( min( a.n, b.n ) < 49 )
        return mul_bn( a, b );

    int i, k;
    bignum a0, a1, b0, b1;
    bignum z2, z1a, z1b, z1, z0;
    bignum x = new_bn( a.n + b.n + 1 );

    qword c;
    const int n = max( a.n, b.n ) / 2;
    a0.d = a.d;
    a0.n = min( n, a.n );
    b0.d = b.d;
    b0.n = min( n, b.n );
    a1.d = a.d + n;
    a1.n = max( 0, a.n - n );
    b1.d = b.d + n;
    b1.n = max( 0, b.n - n );
    z0 = mul_karatsuba( a0, b0 );
    z2 = mul_karatsuba( a1, b1 );
    z1a = add_bn( a1, a0 );
    z1b = add_bn( b1, b0 );
    z1 = mul_karatsuba( z1a, z1b );
    sub_bn( z1, z0 );
    sub_bn( z1, z2 );
    z1 = trim_bn(z1);
    memcpy( x.d, z0.d, (size_t)(x.n = z0.n) * sizeof(qword) );
    if( z1.n != 0 ) {
        for( i = 0, c = 0; i < z1.n; ++i ) {
            x.d[k = n + i] += z1.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    if( z2.n != 0 ) {
        const int n2 = n * 2;
        for( i = 0, c = 0; i < z2.n; ++i ) {
            x.d[k = n2 + i] += z2.d[i] + c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        for( k = n2 + i; c != 0; ++k ) {
            x.d[k] += c;
            c = 0;
            if( x.d[k] >= base ) {
                x.d[k] -= base;
                c = 1;
            }
        }
        if( x.n < k )
            x.n = k;
    }
    ffree(z0);
    return x;
}

static void
fibo( const int n, bignum * const a, bignum * const b ) {
    if( n == 0 ) {
        *a = set_bn(0);
        *b = set_bn(1);
        return;
    }
    fibo( n / 2, a, b );
    bignum ta = *a, tb = *b;
    bignum taa = mul_karatsuba( ta, ta );
    bignum tbb = mul_karatsuba( tb, tb );
    bignum taapbb = add_bn( taa, tbb );
    if( n & 1 ) {
            // [a,b] = [a*a+b*b,b*(2*a+b)]
        bignum t2a = add_bn( ta, ta );
        bignum t2apb = add_bn( t2a, tb );
        bignum tbL2apbR = mul_karatsuba( tb, t2apb );
        copy_bn( a, taapbb );
        copy_bn( b, tbL2apbR );
        ffree(t2a);
    } else {
            // [a,b] = [a*(b*2-a),a*a+b*b]
        bignum t2bma = add_bn( tb, tb );
        sub_bn( t2bma, ta );
        bignum taL2bmaR = mul_karatsuba( ta, t2bma );
        copy_bn( a, taL2bmaR );
        copy_bn( b, taapbb );
        ffree(t2bma);
    }
    ffree(taa);
}

int
main( void ) {
    if( (bp = malloc( (size_t)bn_size * ALLOC_DEPTH * sizeof(qword) )) == NULL ) {
        fprintf(stderr,"Out of memory!\n");
        return EXIT_FAILURE;
    }
    if( N < 2 ) {
        printf( "%d\n", N );
        return EXIT_SUCCESS;
    }
    bignum a, b;
    fibo( N - 1, &a, &b );
    print_bn(b);
    return EXIT_SUCCESS;
}

ejolson
Posts: 5119
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 9:41 am

For completeness I have coded a Fibonacci program fibob2.c that uses the definition to compute the 4784969th Fibonacci number. I ran it on a Ryzen desktop rather than a Raspberry Pi and obtained

Code: Select all

$ time ./fibob2 >fibob2.dat

real    6m47.608s
user    6m47.558s
sys 0m0.000s
Compared to the original fibonacci.c program based on Karatsuba multiplication and the doubling formulas we have

Code: Select all

$ time ./fibonacci >fibonacci.dat

real    0m1.500s
user    0m1.500s
sys 0m0.000s
$ diff fibonacci.dat fibob2.dat
For reference and inclusion in the archive of programs the code is

Code: Select all

/*  fibob2.c -- Compute the nth Fibonacci Number
    Written December 15, 2018 by Eric Olson

    This program is provided for completeness as an example of
    explicitly using the definition

        F(k+1) = F(k) + F(k-1)

    to compute the nth Fibonacci number.  
*/


#include <stdio.h>
#include <stdlib.h>
#include <math.h>

#define N 4784969
#define K 150000

typedef unsigned int digit;
typedef unsigned long long ddigit;
static const digit halfbase=(1<<(sizeof(digit)<<2));
static int base10,zero10;

static inline digit digitlow(digit x){
    return x&((1<<(sizeof(digit)<<2))-1);
}
static inline digit digithigh(digit x){
    return x>>(sizeof(digit)<<2);
}
static void b10mul(digit *d,digit x){
    digit c=0;
    for(int i=1;i<=d[0];i++){
        d[i]=d[i]*x+c;
        c=d[i]/base10;
        d[i]%=base10;
    }
    if(c) d[++d[0]]=c;
}
static void b10add(digit *d,digit c){
    for(int i=1;i<=d[0];i++){
        d[i]+=c;
        c=d[i]/base10;
        d[i]%=base10;
    }
    if(c) d[++d[0]]=c;
}
static void b10print(digit *d){
    printf("%d",d[d[0]]);
    for(int i=d[0]-1;i>=1;i--){
        printf("%0*d",zero10,d[i]);
    }
    printf("\n");
}
static void b2print(digit *a){
    static digit r[K*3]; 
    r[0]=1; r[1]=0;
    for(int i=a[0];i>=1;i--){
        b10mul(r,halfbase);
        b10add(r,digithigh(a[i]));
        b10mul(r,halfbase);
        b10add(r,digitlow(a[i]));
    }
    b10print(r);
}

static void b2add(digit *a,digit *b){
    digit *r=a;
    ddigit t=0;
    int i;
    if(a[0]<b[0]) { a=b; b=r; };
    for(i=1;i<=b[0];i++){
        t+=(ddigit)a[i]+b[i];
        r[i]=t;
        t>>=sizeof(digit)<<3;
    }
    for(;i<=a[0];i++){
        t+=a[i];
        r[i]=t;
        t>>=sizeof(digit)<<3;
    }
    if(!t) r[0]=a[0];
    else r[r[0]=i]=t;
}

static digit a[K]={1,1},b[K]={1,1};
int main(){
    if(sizeof(ddigit)!=2*sizeof(digit)){
        fprintf(stderr, "Please recompile so sizeof(ddigit)=%d"
            " is twice sizeof(digit)=%d!\n",
            (int)sizeof(ddigit),(int)sizeof(digit));
        exit(1);
    }
    for(zero10=0,base10=10;base10<halfbase;base10*=10) zero10++;
    base10/=10;
    digit *p1=a,*p2=b;
    for(int i=2;i<N;i++){
        b2add(p1,p2);
        digit *t=p1; p1=p2; p2=t;
    }
    b2print(p2);
    return 0;
}

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 12:09 pm

ejolson,
A souped up PC makes a wonderful node for a supper computer, especially beef and barley or chicken and dumpling. Don't forget that raspberry pie, while not good for a supper, makes for a wonderful dessert.
Ha! Sup up some soup supper superman.

I must stop using the soup words, I get them muddled up 50% of the time.
Let Tn be the time needed to compute the nth Fibonacci number. I wonder how graphs of Tn versus n look for the programs discussed so far.
So wrapped a timing loop around the c++ fibo and got these results:

Code: Select all

n, seconds
2, 0
4, 0
8, 0
16, 0
32, 0
64, 0
128, 0
256, 0
512, 0
1024, 0
2048, 0
4096, 0.015625
8192, 0
16384, 0
32768, 0.046875
65536, 0.0625
131072, 0.171875
262144, 0.5625
524288, 1.67188
1048576, 4.92188
2097152, 14.5781
4194304, 43.3594
8388608, 129.453
16777216, 389.312
When I plot that it looks like this:
tn.png
tn.png (32.69 KiB) Viewed 1123 times
Memory in C++ is a leaky abstraction .

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 12:24 pm

ejolson,

I tried to do the same timing with the c version but it segfaults at fibo(16)

Code: Select all

void timeIt(int n) {
    double startTime;
    double endTime;
    double elapsedTime;
    bignum a, b;

    startTime = (float)clock()/CLOCKS_PER_SEC;

    fibo( n - 1, &a, &b );

    endTime = (float)clock()/CLOCKS_PER_SEC;
    elapsedTime = endTime - startTime;
    printf("%d, %f\n", n, elapsedTime);
}

int
main( void ) {
    if( (bp = malloc( (size_t)bn_size * ALLOC_DEPTH * sizeof(qword) )) == NULL ) {
        fprintf(stderr,"Out of memory!\n");
        return EXIT_FAILURE;
    }
    if( N < 2 ) {
        printf( "%d\n", N );
        return EXIT_SUCCESS;
    }
    timeIt(4784969);

    for (int n = 2; n <= 1024 * 1024 * 32; n *= 2) {
        timeIt(n);
    }
    return EXIT_SUCCESS;
}

Code: Select all

$ ./fibonacci
4784969, 1.734375
2, 0.000000
4, 0.000000
8, 0.000000
16, 0.000000
Segmentation fault (core dumped)
Memory in C++ is a leaky abstraction .

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 12:33 pm

So the doubling with Karatsuba multiplication is only about 272 times faster. I would have expected a bigger difference in speed, good to know.

It is interesting how this challenge is, as we are using highly optimized algorithms to implement a normal challenge that every programmer is familiar with on a smaller scale, and in doing so breaking the rules of the challenge we are all given when young that is worded something like:
Normal Fibonacci Challenge wrote: Write a program to calculate the Nth fibonacci number, calculating every intermediate fibonacci to reach the answer. Then display only the Nth fibonacci number.
Sorry about the delay in getting my assembly version done. I have not spent much time with it, and have an error in my karatsuba multiplacation implementation :? so I am working on that. That is what I get for working on 5 projects in parallel, and adding this as a quick side (not so quick a guess).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 15790
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 15, 2018 12:50 pm

DavidS,

I would say 272 times faster is frikken huge.

You would never get that kind of performance boost by rewriting your naive algorithm in Python into C or assembler. You could tinker around micro-optimizing that and never make any significant progress.

And don't forget the performance gap gets bigger, and bigger faster, the bigger N we choose.

We are not breaking the rules of this challenge.

In fact if you read the my original wording of the challenge carefully you find that it is not even required to produce the value of fibo(4784969). It is only required to find the value of N for which fibo(N) has a million decimal digits. So "4784969" is the correct answer :)

Someone on this forum showed how go get that result with a simple formula that calculates the length in decimal digits of any fibonacci number. Sadly I cannot find their post now.

The formula is:
Fib(n) = Log10Fib(n)
= Log10(Φn / √5)
= n*Log10(Φ) - Log10√5
= n*Log10(Φ) - (Log105)/2
ejolson uses it in is solution to calculate the memory space required.
Memory in C++ is a leaky abstraction .

Return to “Off topic discussion”