ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 1:32 am

Heater wrote:
Fri Dec 28, 2018 10:25 pm
jahboater,
...are you sure its not overflowing here?
Not sure of anything yet.

Before I took the ejolson's mul_bn() into use I fuzz tested it with about half a million multiplies of numbers from 9 to 387 digits wide. Comparing the results with GMP. No problem.

Built into my bint class I have two failing tests:

test_28 - Which calculates fibonacci numbers from 1 to 20000.

test_29 - Which tests thousands of edge cases that had been failing for me before.

I'll have to find out exactly when things are failing.

I'll have a go with your overflow check as well.

Of course, I may not ve working from ejolson's latest debugged C version.
There are actually two related but different parameters here: the cross over point and how much to delay the carry. A delay of 50 works for the 4784969th Fibonacci number even when using 64-bit integers because one is lucky that there are not enough nines in the multiplicands to cause an overflow. Other Fibonacci numbers seem to deny the mean statistics and end up involving numbers with lots of nines, which necessitates a more conservative delay. Note that fuzzing with random numbers is statistically unlikely to find problems which occur only when numbers have hundreds of nines in a row. For 64-bit integers 16 appears to work for all Fibonacci numbers and maybe in general. For 32-bit there is actually more dynamic range between base^2 and the maximum integer so a larger delay should still work.

As you've noted, the crossover point was taken one less than the carry delay in the original code. While it can be anything, for better performance it makes sense to set it one less than a multiple of the carry delay. For example, if the carry delay is 16, the natural values to check for a good cross-over point would be 15, 31, 47, 63, 79 and so forth. The optimal crossover may be different due to cache effects and the O(n^2) nature of the algorithm.

The delay is implemented differently in the line-numbered Basic code. In that code the delay is adapted to the actual accumulated values. This has the advantage of eliminating more divisions without danger of overflow but the disadvantage of a non-vectorisable inner loop. When working with very expensive floating-point divisions or on a CPU that supports out-of-order execution the carry delay in classic.bas may be preferable; however, I have not made a careful comparison.

ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 2:58 am

ejolson wrote:
Fri Dec 28, 2018 4:35 am
This indicates that the speed of the line-numbered BASIC code when compiled with FreeBASIC is about 10 times slower than the C code. The likely reason for the speed difference is the fact that floating point is slower than integer arithmetic. Moreover, it appears to be single precision.
To the program written in line-numbered BASIC I added the line

Code: Select all

241 defdbl a-z
so the default numeric data type is double-precision and obtained the following on a Pi Zero.

Code: Select all

$ time ./quick | head -c32
10727395641800477229364813596225
real    3m30.979s
user    3m28.880s
sys 0m1.950s
The result is almost 4 times faster than the original code and only 2.5 times slower than the C program. While not bad for such a simple change, the result is no longer compatible with the original versions of Microsoft BASIC.

Even though the early versions of Microsoft BASIC ran on 8-bit hardware that did not have built-in floating point arithmetic units, the lack of expressiveness caused by the restriction to single precision and lack of integer arithmetic immediately limited a programmer's ability to write both scientific and financial codes. Other versions of BASIC, for example CBASIC by Digital Research, addressed this limitation in expressiveness by using 15-digit BCD floating-point and percent signs to specify integer types. Later QuickBasic introduced a reasonably complete selection of different integer and floating point data types, which allowed for the one line change above.

It remains to be seen whether a careful use of integer data types will allow the line-numbered BASIC Fibonacci program to meet or outperform the others.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 12:13 pm

ejolson,
There are actually two related but different parameters here: the cross over point and how much to delay the carry.
Thanks for the clarification. I obviously don't understand that delayed carry as well as I thought, I'll have to study how those bits are jumping in more detail.
Note that fuzzing with random numbers is statistically unlikely to find problems which occur...
Can I add "..occur at extreme limits and edge cases".

Quite so. That is why I created the "9's" stress test and other edge case tests first. The random fuzzing came later and was originally just to satisfy myself that our results matched those of GMP.

When it comes to edge case tests, they rather depend on what is going on internally with the algorithms in use. Ideally these test would verify that all loops terminate in the right place all conditions jump the right way at the right time and so on. Change your algorithm and you have different edge cases requiring different tests.
For 64-bit integers 16 appears to work for all Fibonacci numbers and maybe in general.
In that case the mul_bm in the c version I have been using is buggy and will fail the same way my C++ incarnation did. Do you have a later C version or shall I just fix the one I have?
Memory in C++ is a leaky abstraction .

ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 1:47 pm

ejolson wrote:
Sat Dec 29, 2018 2:58 am
It remains to be seen whether a careful use of integer data types will allow the line-numbered BASIC Fibonacci program to meet or outperform the others.
With the creation of the interactive programming environment called BASIC by Kemény, Kurtz and a host of undergraduate students at Dartmouth College on one hand and the introduction of 8-bit microprocessors by Intel, Motorola, MOS Technology and Zilog on the other began the golden age of personal computing. During this period of time programmable general-purpose computers entered the home, children learned programming and technology driven innovation and prosperity flourished. For a moment the world was liberated from the servitude imposed by the owners of intellectual property. Then it ended.

The next generation of PCs which entered the home were no longer personal nor programmable by default, but instead served as a trusted platform for the delivery of digital-rights-managed content produced by large corporations. To counter the rising crime wave caused by millions of disenfranchised youth, politicians backed by intellectual think tanks made legal what was illegal so corporations could put the criminals out of business by direct competition and marketing. Then came the Raspberry Pi.

Careful forensic analysis of the 8-bit computers and other relics from the golden age indicates the systems were programmed using a language called BASIC. Although C plays the analogous role of universally-available systems programming language on Linux computers, in a parallel universe far away it had been discovered that the efficient expressivity of C disrupted the established social order distinguishing the digital peasantry from the property owners. Moreover, early prototypes could not run Linux but instead consisted of only a GPU running micro Python. The decision to promulgate the use of a slow interpreter intended for education rather than C led to the natural question, then why avoid BASIC? Why avoid BASIC is the focus of the current thread.

It is an easily established fact that a programming language isn't much good unless it can be used to compute million-digit Fibonacci numbers. Therefore, a contest was devised by which to compare BASIC to other programming languages: Code the Karatsuba algorithm for big number arithmetic and use the doubling formula to compute the 4784969th Fibonacci number. While focus has been on a quantitative measure of how efficiently the required algorithms can be expressed, the subjective aspects of readability and productivity of a programming language have also been discussed.

After 38 pages of forum discussion, complete programs were written in various dialects of BASIC, C++, JavaScript and C among other languages and attempts. Having written some of these programs I wonder, why did the golden age of personal computing end? What role did BASIC play in its demise? What language is the right choice to begin a reformed age of personal computing? What needs to be done differently so the reformed age doesn't suddenly end?

ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 1:59 pm

Heater wrote:
Sat Dec 29, 2018 12:13 pm
ejolson,
There are actually two related but different parameters here: the cross over point and how much to delay the carry.
Thanks for the clarification. I obviously don't understand that delayed carry as well as I thought, I'll have to study how those bits are jumping in more detail.
Note that fuzzing with random numbers is statistically unlikely to find problems which occur...
Can I add "..occur at extreme limits and edge cases".

Quite so. That is why I created the "9's" stress test and other edge case tests first. The random fuzzing came later and was originally just to satisfy myself that our results matched those of GMP.

When it comes to edge case tests, they rather depend on what is going on internally with the algorithms in use. Ideally these test would verify that all loops terminate in the right place all conditions jump the right way at the right time and so on. Change your algorithm and you have different edge cases requiring different tests.
For 64-bit integers 16 appears to work for all Fibonacci numbers and maybe in general.
In that case the mul_bm in the c version I have been using is buggy and will fail the same way my C++ incarnation did. Do you have a later C version or shall I just fix the one I have?
To paraphrase a certain company talking about the Meltdown side channel: That's not a bug; it's working exactly the way it was designed to work.

For GitHub it is probably best to use version 4 of the C code as posted here. For the C++ code, simply replace the number 50 in the O(n^2) multiply kernel with 16. If you are feeling lucky try 40, which is the value I used for 32-bit limbs in FreeBASIC. If you try 40, I would be happy to hear how it holds up to your testing and what works best.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 4:24 pm

Nice essay ejolson.
With the creation of the interactive programming environment called BASIC by Kemény, Kurtz... For a moment the world was liberated from the servitude imposed by the owners of intellectual property. ... Then it ended.
Well said.

I too was introduced to computing as a young teenager with BASIC. Except it was via a teletype connected to a far away mainframe in 1972. Later I used ALGOL. By the time 8 bit computers arrived in the late 70's early 80's I was not using BASIC or buying home computers like the C64 etc. No, we programmed in a assembler. Often outlining our solution in a pseudo code that looked like ALGOL first.

In a large part though, you are right, BASIC and cheap 8 bitters kick started a generation of programmers and computer designers.
The next generation of PCs which entered the home were no longer personal nor programmable...Then came the Raspberry Pi.
This was very depressing to watch as it happened. With the IBM PC the diversity of machines we had dried up. New computer owners were "users", not programmers. The PC was no longer a computer but a commodity gadget that you use, like a TV or food mixer. I have referred to this as the dark ages of personal computing.

The Raspberry Pi was, as you say, exactly conceived to try and lighten up the dark times. To enable kids an others to experience the joys (and frustrations) of programming as we did back in the Golden Age.
Careful forensic analysis of the 8-bit computers and other relics from the golden age indicates the systems were programmed using a language called BASIC.
What you mean "forensic analysis" like: Turn them on and observe the BASIC prompt they immediately show:)

I think BASIC was a natural for early personal computers. It's immediate interactivity was a big help of course. But let's not forget that there were no high level language compilers that would run on the limited resources available. The only one I know and used is Intel's PL/M, but that required maxed out memory and disc drives. Besides Intel was not about to let anyone use it without paying a couple of thousand dollars. On such small machines there is not much you can do language wise. FORTH was clearly possible but that was not about to light any fires.
...Moreover, early prototypes could not run Linux but instead consisted of only a GPU running micro Python.
I presume you mean Raspberry Pi prototypes there. In which case I think not. I have read that the original Pi idea was built from AVRs, had no GPU. MicroPython did not exist then but I believe the idea was to use Python some how. Sorry I can't find a link to that.
The decision to promulgate the use of a slow interpreter intended for education rather than C led to the natural question, then why avoid BASIC? Why avoid BASIC is the focus of the current thread.
Ah, now we come to the crunch: Why avoid BASIC?

Hundred's or reasons have been given in this thread already, from lack of a standard to performance and so on. Recently we have been comparing language "expressiveness", whatever that means exactly.

If we look at the expression of the fast Fibonacci algorithm, which we have been using as our vehicle for comparison, in modern BASIC it looks like this:

Code: Select all

dim shared as bignum t1,t2,t3
sub fibowork(n as integer,a as bignum,b as bignum)
    if n=0 then
        a.n=0:b.n=1:b.d(1)=1
        return
    end if
    fibowork(n\2,a,b)
    if n mod 2=0 then
        rem [a,b]=[a*(2*b-a),b*(2*b-a)-(-1)^k]
        bigadd(t1,b,b):bigsub(t2,t1,a)
        bigmul(t1,a,t2):bigmul(t3,b,t2)
        if n mod 4=0 then bigdec(t3) else biginc(t3)
        a=t1:b=t3
    else
        rem [a,b]=[a*(2*a+b)+(-1)^k,b*(2*a+b)]
        bigadd(t1,a,a):bigadd(t2,t1,b)
        bigmul(t1,b,t2):bigmul(t3,a,t2)
        if n mod 4=1 then biginc(t3) else bigdec(t3)
        a=t3:b=t1
    end if
    return
end sub

sub fibo(n as integer,b as bignum)
    if n<2 then
        b.n=1:b.d(1)=n
        return
    end if
    static as bignum a
    fibowork((n-1)\2,a,b)
    if n mod 2=0 then
        rem b=b*(2*a+b)
        bigadd(t1,a,a):bigadd(t2,t1,b):bigmul(t1,b,t2)
        b=t1
    else
        rem b=b*(2*b-a)-(-1)^k
        bigadd(t1,b,b):bigsub(t2,t1,a):bigmul(t3,b,t2)
        if n mod 4=1 then bigdec(t3) else biginc(t3)
        b=t3
    end if
    return
We can compare with the expression of the same algorithm in popular Python:

Code: Select all

def fibo(n):
    if n in fibs:
        return fibs[n]

    k = (n + 1) // 2
    fk = fibo(k)
    fk1 = fibo(k - 1)
    if n & 1:
        result = fk ** 2 + fk1 ** 2
    else:
        result = (2 * fk1 + fk) * fk
    fibs[n] = result
    return result
Similarly clear and succinct expressions of the algorithm are here in Javascript, Haskell and Scala.

Personally I think your wonderful rendition of big fibo in BASIC clearly demonstrates why BASIC is really poor in the expressiveness department. No contest.
It is an easily established fact that a programming language isn't much good unless it can be used to compute million-digit Fibonacci numbers. Therefore, a contest was devised by which to compare BASIC to other programming languages: Code the Karatsuba algorithm for big number arithmetic and use the doubling formula to compute the 4784969th Fibonacci number.
No. The "contest" as I originally stated the challenge was not about Karatsuba algorithm or any big number math operators. Karatsuba is not even mentioned. The challenge was simply to calculate the first Fibo number with a million digits.In the language of your choice.

DavidS then suggested we don't use any non-standard libraries as that is some kind of cheating. I agreed because that prevents people form simply calling GMP's fibo from C or whatever. That is were the need for DIY maths functions came from, hence Kartasuba. They are not part of the problem statement.

I have to be honest here, I was being a bit wicked when I posted that challenge. DavidS had for years been suggesting that BASIC and ARM assembler are very simple and efficient for the programmer. A position I could not agree with. I knew that this problem takes 10 minutes for a Python programmer would take a lot more work in BASIC or ARM assembler. Or ten minutes for those that use Scala, Haskell or Javascript. I was also convinced that the final solution would perform better in those languages.

So far my supposition has been confirmed true. Except, ejolson, you rather rocked my boat by producing not one but two versions in very different BASIC styles in what seemed like a very short time. I'm impressed.
After 38 pages of forum discussion, complete programs were written in various dialects of BASIC, C++, JavaScript and C among other languages and attempts. Having written some of these programs I wonder, why did the golden age of personal computing end? What role did BASIC play in its demise? What language is the right choice to begin a reformed age of personal computing? What needs to be done differently so the reformed age doesn't suddenly end?
Those are deep questions I will have to think on some more...
Memory in C++ is a leaky abstraction .

jahboater
Posts: 4690
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 4:57 pm

Heater wrote:
Sat Dec 29, 2018 4:24 pm
On such small machines there is not much you can do language wise. FORTH was clearly possible but that was not about to light any fires.
Long ago there was the Sinclair ZX81 (8-bit Z80 based cheap home computer running BASIC).
A variant of that came out called the Jupiter Ace - much the same except that it ran Forth. It was an order of magnitude faster, and much more fun to program. I purchased one.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 5:00 pm

ejolson,
To paraphrase a certain company talking about the Meltdown side channel: That's not a bug; it's working exactly the way it was designed to work.
Touche!

Fair enough but I'm posting this code to a public git repo. Such code, no matter how obscure, tends to be found by people and cut and pasted into whatever they are doing. So I'd like to have it has fool proof as possible.

I updated the repo with your version 4 fibonacci.c. It seems about the same speed here on my PC. About 1.7 seconds.
For the C++ code, simply replace the number 50 in the O(n^2) multiply kernel with 16. If you are feeling lucky try 40...
I was feeling lucky earlier today. I tried a few different carry delays:

30 FAIL
19 FAIL
18 PASS

I have 18 in the code now but I'm not convinced that is robust. I was intending to make my tests a bit more punishing and try again.

What I want to know now is:

Given that the C++ code uses the same multiplication algorithm and fibo algorithm as the C code (almost) how come the C++ version ends up nearly three times faster?!

I believe we have pretty much the same karatsuba dance going on. The O(n^2) mul is yours. The fibo dance is similar. So how come the huge difference?
Memory in C++ is a leaky abstraction .

jahboater
Posts: 4690
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 5:08 pm

Heater wrote:
Sat Dec 29, 2018 5:00 pm
30 FAIL
19 FAIL
18 PASS

I have 18 in the code now but I'm not convinced that is robust. I was intending to make my tests a bit more punishing and try again.
I made it 16 for a small safety margin - and of course it removes a nasty division :)
Heater wrote:
Sat Dec 29, 2018 5:00 pm
What I want to know now is:

Given that the C++ code uses the same multiplication algorithm and fibo algorithm as the C code (almost) how come the C++ version ends up nearly three times faster?!

I believe we have pretty much the same karatsuba dance going on. The O(n^2) mul is yours. The fibo dance is similar. So how come the huge difference?
Furthermore it is the same compiler.
There must be an algorithm difference somewhere.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 5:23 pm

jahboater,
Long ago there was the Sinclair ZX81 (8-bit Z80 based cheap home computer running BASIC).
I remember it well.

I never had one. By the time that came out I was 20 something years old, in work I had Intel Development Systems (Intellec II) and all kind of other machines to play with. We were building systems with 8 bit micros from Intel, Motorola and others in that company so I didn't feel the need to get an 8 bit computer of my own.

I was somewhat wary of Clive Sinclair's toys after his Scientific calculator and the Sinclair stereo system kit I had built years before.

So, sadly, I was a bit too old for that magic time of kids in their bedrooms hacking on Sinclair's, Beebs, C64 etc.
A variant of that came out called the Jupiter Ace...
I recall reading of the Ace too. Never actually saw on for real.
... - much the same except that it ran Forth.
Yes. Like I said, that is not going to light any fires. Not many owned or ever heard of the Jupiter Ace.

Still, such machines were what made up that huge variety which I missed later as we moved into the Personal Computing Dark Ages.
Memory in C++ is a leaky abstraction .

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 5:56 pm

jahboater,
Furthermore it is the same compiler.
Yes. Same compiler same OS same machine.

By now all the important algorithms are the same. All my original attempts at creating them myself have been replaced with work derived from others here.

Looking at the output of gprof I see that almost 80% of the time is spent in the O(n^2) multiplication. Which is a direct translation of ejolson's mul_bn() in C to C++.

The only "clever" parts I have added to all this are:

1) In order to reduce the number of memory allocations required I keep all big integers of 128 elements or less in width on the stack, directly within the bigint objects, rather than creating arrays on the heap with new[] and delete[].

2) In order to reduce the amount of redundant copying going on when creating new big integers from old ones that are about to be deleted I use C++ "move semantics" to just move the pointer to the data to the new big int rather than copy it.

Of course ejolson does much the same as 1) by allocating a big space at start up and working from that.

I don't know. I thought I would have a go at profiling bothe C and C++ versions to see if we can spot anything obvious.

I guess if we go back and look at all the posts I started with "Yay" and then announced a performance increase we might get a clue as to what is going on.
Memory in C++ is a leaky abstraction .

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 6:24 pm

I found one big contributor to the C++ performance boost over the C version.

I forgot the other "clever" thing I did:

3) Memoize the results of the fibo calculations as it recurses. Thus saving having to calculate the same fibo more than once.

Turns out that if I remove that it slows things down buy 56%. That is big. We go from about 0.6 seconds to 1 second.

Still leaves another 0.7 seconds unaccounted for.
Memory in C++ is a leaky abstraction .

User avatar
Paeryn
Posts: 2679
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 9:26 pm

Heater wrote:
Sat Dec 29, 2018 6:24 pm
I found one big contributor to the C++ performance boost over the C version.

I forgot the other "clever" thing I did:

3) Memoize the results of the fibo calculations as it recurses. Thus saving having to calculate the same fibo more than once.

Turns out that if I remove that it slows things down buy 56%. That is big. We go from about 0.6 seconds to 1 second.

Still leaves another 0.7 seconds unaccounted for.
Using the method I did in Haskell (and I think ejolson's does the same) you don't need memoisation as each iteration calculates succesivly higher pairs from the previous pair.

Haskell can be nice with memoisation, it can happen automatically. As functions are pure it knows that given the same input a function will always give the same answer. If it knows that it has already calculated F(x) for a given x before then it can use that result again without having to re-calculate it.
She who travels light — forgot something.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 10:57 pm

Paeryn,
Using the method I did in Haskell (and I think ejolson's does the same) you don't need memoisation as each iteration calculates succesivly higher pairs from the previous pair.
But I am using your method, or as close as I can get in C++, and I do need memoization, it saves almost half the execution time!

This is my C++ code with your Haskell FiboFast interspersed as comments:

Code: Select all

// fiboFast :: Int -> Integer
// fiboFast n
const bint fibo (int n) {
    // | n < 2          = toInteger n
    if (memo.find(n) != memo.end()) {
        return memo[n];    
    }

    // where
    //    k = n `div` 2
    int k = (n / 2);
    //    a = fiboFast k
    const bint a = fibo(k);
    //    b = fiboFast (k - 1)
    const bint b = fibo(k - 1);

    // | even n         = a * (2 * b + a)
    if (isEven(n)) {
        return memo[n] = a * (two * b + a);
    }
    // | n `mod` 4 == 1 = (2 * a + b) * (2 * a - b) + 2
    if ((n % 4) == 1) {
        return memo[n] = (two * a + b) * (two * a - b) + two;
    }
    // | otherwise      = (2 * a + b) * (2 * a - b) - 2
    return memo[n] = (two * a + b) * (two * a - b) - two;
}
About as close as we can get I think.

My view of it is as follows:

As this fibo recurses it is creating a binary tree of smaller fibo values. On every recursion down the tree you go "left" into fibo(k) and "right" into fibo(k - 1);

My conclusion is that as we descend this tree at some point the actual fibo values visited in the left branch are the same as fibo values visited in the right branch. Or vice versa.

This is obviously true for the base cases fibo(0) and fino(1). No matter if you go left or right from the top you must always come down to those same base cases 0 and 1.

Presumably the left and right branches of these trees pass through the same values higher up as well. If the tree originating from the right branch has already calculated a value somewhere then the tree originating from the left branch need not do it again.

If this were not the case then my adding the memo would make no difference to the run time.

If Haskell is automating this memoization behind the scenes so that the programmer need not state it explicitly then I'm very impressed.

Or... Have I misunderstood something here?
Memory in C++ is a leaky abstraction .

ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 11:39 pm

Heater wrote:
Sat Dec 29, 2018 6:24 pm
I found one big contributor to the C++ performance boost over the C version.
I tried to summarise the optimisations missing from the C version in this previous post.

ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 29, 2018 11:54 pm

Heater wrote:
Sat Dec 29, 2018 5:56 pm
Looking at the output of gprof I see that almost 80% of the time is spent in the O(n^2) multiplication. Which is a direct translation of ejolson's mul_bn() in C to C++.
This 80 percent explains why the assembler optimized arithmetic kernels in GMP are so effective even though they constitute only a small fraction of the source. If someone writes an assembler-optimized NEON-based O(n^2) multiplication routine for base-10^p big numbers, I wonder if the resulting program would be faster than GMP.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 12:21 am

ejolson,

Ah yes.
The C code is still using the original doubling formulas which result in 1.5 times more work per doubling. The C code also computes the final value of a which essentially doubles the work in the last return....
Sounds reasonable.

I have your C doubling formula cast into C++ here. It looks like this:

Code: Select all

// This Fibonacci version derived from ejolson's doubling formula C example.
bint a;
bint b;
static void fiboEjOlson(const int n) {
    if (n == 0) {
        a = zero;
        b = one;
        return;
    }
    fiboEjOlson(n / 2);
    bint ta = a;
    bint tb = b;
    if (isEven(n)) {
        b = (a * a) + (b * b);
        a = ta * ((tb + tb) - ta);
    } else {
        a = (a * a) + (b * b);
        b = tb * ((ta + ta) + tb);
    }
}
You are right in your performance estimates. That runs in 1.43 seconds on my PC vs the 0.62 seconds of the Haskell derived fibo that I'm using. A factor of 2.3 different.

That makes up most of the performance difference I'm seeing between C and C++ versions I was wondering about.

Presumably when you have implemented that into the C version you will be giving C++ a run for it's money again!
Memory in C++ is a leaky abstraction .

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 12:53 am

ejolson,
This 80 percent explains why the assembler optimized arithmetic kernels in GMP are so effective even though they constitute only a small fraction of the source.
In face of the evidence I have to concede the point on that.

With the proviso that one can actually write more efficient assembler for those small parts than the compiler can generate. That is quite likely true though, particularly if one's assembler can make use of machine specific instructions, like vector operations, that compilers may not use or uses inefficiently.

It does confirm my belief that one should not turn to assembler, except for those small critical parts, and only if one really needs to.
If someone writes an assembler-optimized NEON-based O(n^2) multiplication routine for base-10^p big numbers, I wonder if the resulting program would be faster than GMP.
I that is an interesting question. I'm afraid that someone might have to be you :)
My assembler days are over, unless it's for the Parallax Propeller MCU or the boot code for my RISC V project.
Memory in C++ is a leaky abstraction .

User avatar
Paeryn
Posts: 2679
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 12:58 am

Heater wrote:
Sat Dec 29, 2018 10:57 pm
Paeryn,
Using the method I did in Haskell (and I think ejolson's does the same) you don't need memoisation as each iteration calculates succesivly higher pairs from the previous pair.
But I am using your method, or as close as I can get in C++, and I do need memoization, it saves almost half the execution time!

This is my C++ code with your Haskell FiboFast interspersed as comments:

Code: Select all

// fiboFast :: Int -> Integer
// fiboFast n
const bint fibo (int n) {
    // | n < 2          = toInteger n
    if (memo.find(n) != memo.end()) {
        return memo[n];    
    }

    // where
    //    k = n `div` 2
    int k = (n / 2);
    //    a = fiboFast k
    const bint a = fibo(k);
    //    b = fiboFast (k - 1)
    const bint b = fibo(k - 1);

    // | even n         = a * (2 * b + a)
    if (isEven(n)) {
        return memo[n] = a * (two * b + a);
    }
    // | n `mod` 4 == 1 = (2 * a + b) * (2 * a - b) + 2
    if ((n % 4) == 1) {
        return memo[n] = (two * a + b) * (two * a - b) + two;
    }
    // | otherwise      = (2 * a + b) * (2 * a - b) - 2
    return memo[n] = (two * a + b) * (two * a - b) - two;
}
About as close as we can get I think.
...
Or... Have I misunderstood something here?
Yes, fiboFast needs memoisation, fiboFaster doesn't which is what I was referring to.
She who travels light — forgot something.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 1:04 am

Paeryn,

Ah, OK, thanks.

I don't yet understand what FiboFaster is doing ....
Memory in C++ is a leaky abstraction .

ejolson
Posts: 3584
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 1:21 am

Heater wrote:
Sun Dec 30, 2018 12:21 am
It looks like this

Code: Select all

// This Fibonacci version derived from ejolson's doubling formula C example.
bint a;
bint b;
static void fiboEjOlson(const int n) {
    if (n == 0) {
        a = zero;
        b = one;
        return;
    }
    fiboEjOlson(n / 2);
    bint ta = a;
    bint tb = b;
    if (isEven(n)) {
        b = (a * a) + (b * b);
        a = ta * ((tb + tb) - ta);
    } else {
        a = (a * a) + (b * b);
        b = tb * ((ta + ta) + tb);
    }
}
Those operator overloaded object oriented big number routines make for readily readable code. The fact that the efficiency is good is also very impressive.

When my amusement with line-numbered BASIC wears off, I'll improve the C code with a better implementation of the doubling formulas if someone has not done so already. It should be a straight-forward conversion of the corresponding routines in the FreeBASIC code.

From an expressivity point of view, it would be interesting if the inner loop in the O(n^2) multiplication routine is really being vectorized. Another consideration is that all expressions of big number arithmetic and the doubling formulas so far only use 1/4 of the available hardware on a quad-core Pi. In a world where quad-core is low end and 64-core is marketed as consumer hardware, a programming language that can only express serial algorithms is not very expressive. In particular, the task of efficiently expressing a computation to run on modern hardware immediately leads to parallel algorithms.

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 2:29 am

ejolson,
Those operator overloaded object oriented big number routines make for readily readable code. The fact that the efficiency is good is also very impressive.
Why thank you. It's been a long road getting there.

I was pondering how we judge the expressiveness of C++...

If we look only at that top level fibo() then it's as clear and concise as Python, Javascript, Haskell, Scala... What we have there is as clear as any pseudo code one might write when explaining the algorithm to anyone. Certainly a lot less obscure than C or any BASIC version.

My aim when I started out was to not have any pointers in the solution. They are an implementation detail I don't want to see.

On the other hand there is the small matter of the 500 lines of code I have written to implement those operators and do it efficiently. There we are burried in C++ specifics like classes, methods, pointers, references (&), rvalue references (&&), move semantics etc, etc, etc.

On the third hand, if you happen to have such a big integer class available already, then that fibo() can use it without change. If GMP were allowed here I would not have had to create that 500 lines of code!

On the fourth hand, so much C++ code is bogged down with the "line noise" that is the C++ syntactic baggage it makes me crazy. Especially if templates and name spaces are involved.

The efficiency is good only because of the "move semantics" introduced in C++11 or whatever recently. The best thing to come out of this exercise is that I learned what that is all about!
...it would be interesting if the inner loop in the O(n^2) multiplication routine is really being vectorized.
Looking at the assembler generated by GCC on the PC for this I see this:

Code: Select all

.L77:
	movdqu	(%r11,%rax), %xmm2
	addl	$1, %r8d
	movdqa	%xmm2, %xmm1
	movdqa	%xmm2, %xmm0
	pmuludq	%xmm4, %xmm2
	pmuludq	%xmm3, %xmm1
	psrlq	$32, %xmm0
	pmuludq	%xmm3, %xmm0
	paddq	%xmm2, %xmm0
	psllq	$32, %xmm0
	paddq	%xmm1, %xmm0
	paddq	(%rbx,%rax), %xmm0
	movaps	%xmm0, (%rsi,%rax)
	addq	$16, %rax
	cmpl	%r10d, %r8d
	jb	.L77
Those p...udq instructions are operating on packed unsigned double word integers. It's not clear to me how many it is operating on at a time. I suspect that on my machine it is only two.
...a programming language that can only express serial algorithms is not very expressive
Oh boy. Parallel programming... You are right there.
Memory in C++ is a leaky abstraction .

jahboater
Posts: 4690
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 8:26 am

Heater wrote:
Sun Dec 30, 2018 2:29 am
Those p...udq instructions are operating on packed unsigned double word integers. It's not clear to me how many it is operating on at a time. I suspect that on my machine it is only two.
I think two 64-bit lanes is right.
xmm registers are 128 bits, ymm are 256 bits, and zmm are 512 bits.
(Intel obviously didn't think of future 1024+ bit registers:)

Heater
Posts: 13371
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 9:17 am

jahboater,
Did I see you say "Must be a fairly old PC"
I'll have you know this is an Intel Intel Core i7-2600K. Circa 2011 - 2013.

As such it's probably the fastest, newest PC I have ever owned in my life :)

I don't by PC's. There always seems to be a ready supply of them for free from whatever workplace.

But you are right, I compiled fibo with some auto-vectorization options like "-mavx512f -mavx512cd". It compiles but won't run on this machine:

Code: Select all

$ g++ -Wall -Wextra -std=c++17 -O3 -mavx512f -mavx512cd  -fopt-info-vec-all  -o fibo_karatsuba fibo_karatsuba.cpp &> report.txt
$ ./fibo_karatsuba
Illegal instruction (core dumped)
$
By the way, that -fopt-info-vec-all option causes GCC to output a huge report on what it is up to as it analyses your code for vectorization. In this case it vectorizes exactly one loop, a loop inside ejolson's mul-bn().

Code: Select all

Analyzing loop at bint.h:448
bint.h:448:35: note: ===== analyze_loop_nest =====
bint.h:448:35: note: === vect_analyze_loop_form ===
bint.h:448:35: note: === get_loop_niters ===
bint.h:448:35: note: Symbolic number of iterations is (unsigned int) _196
...
... Hundreds of lines of gibberish...
...
loop at bint.h:449: if (ivtmp_3343 >= bnd.290_3401)

bint.h:448:35: note: LOOP VECTORIZED
Memory in C++ is a leaky abstraction .

jahboater
Posts: 4690
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sun Dec 30, 2018 9:55 am

Heater wrote:
Sun Dec 30, 2018 9:17 am
Did I see you say "Must be a fairly old PC"

I'll have you know this is an Intel Intel Core i7-2600K. Circa 2011 - 2013.
Sorry, I thought the comment might be considered impolite, so removed it :)

I used to get all sorts of old servers discarded by the test labs, rarely modern PC's though.
Now I am retired, I have to buy hardware :(

I like fanless PC's with no moving parts (other than the on/off switch), so the forthcoming 10nm Intel series (with AVX512) are looking interesting.

Return to “Off topic discussion”