User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 2:29 am

ejolson wrote:
Sat Dec 01, 2018 2:08 am
DavidS wrote:
Sat Dec 01, 2018 1:11 am
Remember also that the A53 in the RPi 3B/3B+/3A+ is already a lot faster per single core than the ARM1176JZF-S of the RPi A/B/A+/B+/Zero/ZeroW at the same clock (the a single A53 core in the RPi 3B at 500MHz will run circles around the ARM1176JZF-S of the RPi 3B+ at 700MHz running the same exact binary).
Did you ever get the Pi pie chart program to run? It would be interesting to see the results of pichart-serial on a Pi 3B under clocked to 500MHz.

How hard do you think it would be to create a version of FreeBASIC that leveraged the LLVM code generator to produce executable binaries for the Raspberry Pi?
Unfortunately I have not gotten it to run, honestly I have been more concentrating on getting some tutorial programs cleaned up for teaching noobs to code for RISC OS, in BBC BASIC, ARM Assembly (ObjAsm syntax), and C.

I will have to take another look at it to see if I can figure out why it is not working, my last attempt crashed the system. 500MHz is not my normal running speed, though I can clock it down to that (and have clocked it lower for testing).

BTW clocking a 1B+ at 400MHz for arm_freq is a good way to test performance of code, if it runs well with that setting (and sdram_freq=200, gpu_freq=200) then I consider it fast enough to be usable on an ARMv6 BCM2835 based Pi at 1000MHz, and at least default speed on everything else.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 4:33 am

jahboater,
You may not care if your Pi slows down by 30%, but if you are benchmarking fibo() runs, your results will be meaningless because you have no idea what the clock speeds were ...
Ah yes, benchmarking. I guess I don't do enough of that to think about it so hard.

I'm more likely to worry about stable clock speeds when trying to get timing determinism in real-time interactions. Then fast is perhaps not a consideration but stable is.

I'll bear that in mind when I get to benchmarking our fibo(4784969).
You may be OK in Finland where the ambient temperature is presumably on the low side
Not always. This has been the longest, hottest, driest summer for a hundred years. 30C in the shade for weeks on end. Easily exceeding 40C on my balcony every day.

This may have contributed to why my Surface Pro 4 exploded in the summer: https://forums.parallax.com/discussion/ ... nt_1444509
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 9:54 am

DavidS wrote:
Sat Dec 01, 2018 2:29 am
BTW clocking a 1B+ at 400MHz for arm_freq is a good way to test performance of code, if it runs well with that setting (and sdram_freq=200, gpu_freq=200) then I consider it fast enough to be usable on an ARMv6 BCM2835 based Pi at 1000MHz, and at least default speed on everything else.
+1
Yes! I down clock a Pi Zero to 200MHz for this exact reason.
If my program is usable on that and has no obvious "sticky" times, then it will be very fast indeed on a Pi3+

Another approach is to use the valgrind tool. This takes a compiled executable and "interprets" it instruction by instruction.
It can then check all the memory references, function arguments and so on. As you can imagine, that's quite slow! It both tests the code and checks it is fast enough at the same time.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 12:49 pm

jahboater wrote:
Sat Dec 01, 2018 9:54 am
DavidS wrote:
Sat Dec 01, 2018 2:29 am
BTW clocking a 1B+ at 400MHz for arm_freq is a good way to test performance of code, if it runs well with that setting (and sdram_freq=200, gpu_freq=200) then I consider it fast enough to be usable on an ARMv6 BCM2835 based Pi at 1000MHz, and at least default speed on everything else.
+1
Yes! I down clock a Pi Zero to 200MHz for this exact reason.
If my program is usable on that and has no obvious "sticky" times, then it will be very fast indeed on a Pi3+

Another approach is to use the valgrind tool. This takes a compiled executable and "interprets" it instruction by instruction.
It can then check all the memory references, function arguments and so on. As you can imagine, that's quite slow! It both tests the code and checks it is fast enough at the same time.
Sounds like an ARM Emulator with builtin analisys, interesting. I do sometimes use ARM emulation to test out the speed for things that should best be able to run all the way down to 4MIPS, though not quite to that level.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 12:52 pm

Though on the issue of BASIC and huge numbers, I am still looking at possibilities.

I do not know why I got hooked on this enough to spend as much time on it, though I have. And now I am not getting done nearly as much as I would like on other projects, well this is good for learning/relearning (I remember when we had to do 32-bit integer math on a 6502, or 6809, that was interesting and usually easy).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 12:54 pm

@Heater:
Is it acceptable to use a few lines of ARM assembly to implement the large add? I ask for the purpose of the slow version in BASIC, as it would simplify things for the math.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 4:05 pm

DavidS,
I do not know why I got hooked on this enough to spend as much time on it, ...
It's because you opened this thread as "Why Avoid BASIC on RPi?". So you have to defend your position as why not to.

I'm glad you are taking up the challenge. I'm interested in the outcome. Besides, it's all good practice for our programming chops, as I am finding with my C version.
Is it acceptable to use a few lines of ARM assembly to implement the large add?
Personally I think that is cheating. If we are having a discussion and subsequent coding challenge starting from "Why Avoid BASIC..." then presenting a solution in some other language does not feel quite right.

Besides, if one is performing a bunch of additions with carries over an array of numbers I don't see how doing it in assembler makes it any clearer than doing in a high level language.

It might be a bit faster but personally I'm not to fussed about ultimate performance. The fast fibo algorithmm we have here is orders of magnitude faster for big numbers than doing it the simple way, so a speed difference between languages of 2 or 3 times is neither here nor there.

What do others think?

Meanwhile, I'm progressing slowly with my solution. I'm suffering "code blindness". Recently I have been hacking C, C++, Scala, Verilog, Javascript and even Python. That has scrambled my brain. All code now looks as meaningless as the stuff you get when the baud rate is set wrong on your modem. Perhaps I need a little break...
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 4:32 pm

Heater wrote:
Sat Dec 01, 2018 4:05 pm
It might be a bit faster but personally I'm not to fussed about ultimate performance. The fast fibo algorithm we have here is orers of magnitude faster for big numbers than doing it the simple way, so a speed difference between languages of 2 or 3 times is neither here nor there.
Yes, it will never compete with the fast algorithm.
But it is very tempting to use the hardware carry flag to do the full "64-bits at a time" long arithmetic. But as you say, assembler is not C or Basic.
What about intrinsic's in C? Not portable but very fast.
This function works for any integer type; it returns true on overflow, or on carry if the arguments are unsigned.

__builtin_add_overflow( a, b, &res )

This is nearly as fast as assembler. But with assembler you can use the add-with-carry adc instruction to follow on.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 4:47 pm

Ok then I will stick to BASIC.

And the temptation was to use the hardware carry to do 4 million bit addtion (just as easy as using it to do 64bit :) ).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 4:52 pm

DavidS wrote:
Sat Dec 01, 2018 4:47 pm
And the temptation was to use the hardware carry to do 4 million bit addtion (just as easy as using it to do 64bit :) ).
Of course. adds; adc
Just half the speed :(
Surprisingly a 64-bit add takes exactly the same time as a 32-bit add.
Multiply is fractionaly slower and division is a lot slower.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 4:59 pm

jahboater wrote:
Sat Dec 01, 2018 4:52 pm
DavidS wrote:
Sat Dec 01, 2018 4:47 pm
And the temptation was to use the hardware carry to do 4 million bit addtion (just as easy as using it to do 64bit :) ).
Of course. adds; adc
Just half the speed :(
Or ADDS, ADCS, ADCS, etc, just have to store and load after every forth operation, and more than 2 times slower, because we are using dependant values in concutive instructions the CPU can not run as many in instructions on the same clock (remember we have a multiple instruction issue pipeline).
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 5:38 pm

jahboater,
But it is very tempting to use the hardware carry flag to do the full "64-bits at a time" long arithmetic.
Yes. And I have been wondering about what that "add with carry" thing actually buys you.

If we have the luxury of working in the base of our choosing, which we do, then we can borrow from PeterO's ALGOL 60 example.

The biggest power of ten we can fit in a 64 bit unsigned integer is 10000000000000000000 or 10 to the power 19.

OK, so if we work with an array of "digits" to the base 10e19, each stored in an unsigned 64 bit integer, we can detect carry by the fact that an addition of two digits is bigger than 10e19.

That may well be wasting a few bits of our unit64_t and it is for sure slower than letting a 64 bit addition roll over and checking the cary flag.

BUT...

It makes for very clear code. Anyone who undestands decimal arithmetic can follow it.

Plus it allows for very simple and fast printing of the final result. As we have seen in the Python and Javascript solutions using big integers producing the output takes an order of magnitude or more time than actually calculating the result.

As a teaser here is the guts of my big integer sum function:

Code: Select all

    for (i = 0; i < a->width; i ++)
    {
        if (i < b->width)
        {
            sum = a->value[i] + b->value[i] + carry;
        }
        else
        {
            sum = a->value[i] + carry;
        }

        if (sum > LIMIT)
        {
            sum = sum - LIMIT - 1;
            carry = 1;
        }
        else
        {
            carry = 0;
        }
        result->value[i] = sum;
    }
Where LIMIT can be 9 or 99 or 9999999999999999999.

Turns out that when using that sum algorithm in a recursive fibo() it makes no noticeable difference in speed if the "digits" are small, 0-9, or huge, 0-9999999999999999999. The recursive fibo() has so much overhead in function calling.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5809
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 5:47 pm

Heater wrote:
Sat Dec 01, 2018 4:05 pm
What do others think?
Not being able to write code which effectively maps onto add-with-carry operations in hardware is a lack of expressivity shared by almost all high-level programming languages. While

__builtin_add_overflow( a, b, &res )

attempts to remedy this in C, one would hope for a solution without so many underscores that used the notation of a plus sign instead. This built-in function also copies the carry flag back and forth between an integer register, so still lacks the expressivity of assembly from an efficiency point of view. It is also not portable to standards compliant C compilers which do not include such things.

What high-level programming languages allow the automatic detection of overflow on additions involving primitive integer data types?

It looks like we are stuck using fewer bits than available or testing for possible overflow before the addition. Maybe we should just code this in COBOL.
Last edited by ejolson on Sat Dec 01, 2018 5:50 pm, edited 1 time in total.

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 5:49 pm

jahboater,
Surprisingly a 64-bit add takes exactly the same time as a 32-bit add.
That is surprising at first glance.

The slow thing about addition is propagating the carry from one bit to the next. The wider the words the longer it might take. At least is a "ripple carry" adder circuit.

With a ripple carry adder the time taken to perform an addition is linearly dependendent on the number of bits.

Of course that is not the circuit they use. For example there are "Carry-LookAhead Adders": http://www.ece.ubc.ca/~stevew/515/handouts/arith.pdf
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 5:51 pm

I did all this some time ago when implementing a very precise calculator.

strtod() happily accepts 2.2222222222222222222222222222222222222222222222222222222222222222222222222222222
without comment, even though many digits will have been discarded and the result is wrong (I wish it would at least raised FE_INEXACT but it doesn't, it just throws away the excess precision).

I wrote an arbitrary precision multiply and add sequence to detect if the users value would overflow the mantissa.
I eventually didn't use it because it was too slow :(

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 6:03 pm

ejolson wrote:
Sat Dec 01, 2018 5:47 pm

__builtin_add_overflow( a, b, &res )

attempts to remedy this in C, one would hope for a solution without so many underscores that used the notation of a plus sign instead.
That's easy enough in C++.
In C I just use:

#define add(a,b) __builtin_add_overflow( a, b, &a )

which works well enough and is fairly readable
ejolson wrote:
Sat Dec 01, 2018 5:47 pm
This built-in function also copies the carry flag back and forth between an integer register, so still lacks the expressivity of assembly from an efficiency point of view.
No, it uses the carry or overflow flag directly
For example:
int a, b;
if( add( a, b ) ) printf( "error overflow\n" );

emits something like:
add eax, ebx
jno l1
call printf
l1:

You can see its used the overflow flag (jump if not overflow) directly and is just as fast as assembler.
The same applies for unsigned arithmetic where it uses the "jnc" equivalent.

What it I suspect it cant do is use an ADC instruction afterwards, but I may be unfair to the compiler as I have never tried it.
GCC never ceases to surprise me on things like this.

Most of the big compilers such as clang, icc and probably armcc (since its based on llvm) copy gcc on things like this,
It still isn't truly portable or standards compliant of course (yet).

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 6:26 pm

Heater wrote:
Sat Dec 01, 2018 5:49 pm
jahboater,
Surprisingly a 64-bit add takes exactly the same time as a 32-bit add.
Take look at this site:
http://www.uops.info/table.html
You can choose your CPU at the top.

And of course the famous Agner Fog site:
https://www.agner.org/optimize/instruction_tables.pdf

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 6:29 pm

ejolson,
Not being able to write code which effectively maps onto add-with-carry operations in hardware is a lack of expressivity shared by almost all high-level programming languages.
Oh boy. That is a can of worms.

That has bugged me for years as well. Where is my carry? Where is my signed overflow? That and the lack of rotate operators. Being weened on assembler as I was. How could the language designers omit such a simple, obvious, useful thing?

I cannot speak for other languages much but it turns out that the designers of C were correct in omitting them. Not all machines use two's complement arithmetic, therefore as C is an abstraction over commonly available architectural features the idea of carry as we know it might not be available everywhere. Similarly for any rotate operator.

Today we find the new RISC V architecture has no carry or any other flags defined in the instruction set. If the C language had some notion of carry in the language it would have to be synthesized by the compiler with multiple instructions anyway.
...one would hope for a solution without so many underscores that used the notation of a plus sign instead.
I agree with the loathing of underscores. Do you have a suggestion for a high level language syntax that would handle carry?

Currently we have the addition operator "+". As in:

Code: Select all

a = b + c;
How do you suggest we express getting the carry of that addition out? Or carrying it in in the first place?
What high-level programming languages allow the automatic detection of overflow on additions involving primitive integer data types?
Ada. At least when building your program with range checking switched on.

Personally I think the problem is with our CPU hardware. It's absurd that we can make additions that overflow the hardware registers, produces an incorrect result and silently continue. No, it should cause a trap, put it's hand up and say "Sorry I can't do that". Same like doing a divide by zero.
Memory in C++ is a leaky abstraction .

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 6:34 pm

jahboater,

Thanks, interesting execution time numbers.

I can't help thinking it's mostly irrelevant today. The big deal now is memory access time. Keeping stuff in cache, working set size. Get that wrong and you have a ten times slow down!
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 6:44 pm

Heater wrote:
Sat Dec 01, 2018 6:29 pm
That and the lack of rotate operators.
res = n >> count | n << (64 - count)
Will produce a single rotate instruction.
The little calculator I mentioned above has a rotate operator called <>
It masks the count with the wordsize-1, and passing a negative count rotates the other way!
Heater wrote:
Sat Dec 01, 2018 6:29 pm
Not all machines use two's complement arithmetic,
They do now. C++14 and I expect C2x have declared that signed integer overflow is a "defined" operation (like unsigned wrap always has been). That is because they think there is no longer any integer hardware that is not twos-complement.
Heater wrote:
Sat Dec 01, 2018 6:29 pm
Ada. At least when building your program with range checking switched on.
and C if you turn range checking on with -ftrapv

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 7:00 pm

Heater,
I did this for the multi precision add loop.
It does 32 bits at a time with a 64-bit intermediate sum. This means that the carry bit is available by shifting it down and the 32-bit sum is masked off. No assembler or builtins. It adds the "addend" to the "mantissa" arrays.

Code: Select all

    dword mantissa[MAX], addend[MAX];
    dword *m = mantissa;
    const dword *a = addend;

    qword sum = 0;
    do
    {
      sum += (qword)(*m) + (qword)(*a);
      *m++ = (dword)sum;
      sum >>= 32;  // carry
    }
    while( ++a <= last );
Last edited by jahboater on Sat Dec 01, 2018 7:06 pm, edited 1 time in total.

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 7:04 pm

Interesting.

That "defined" behavior is for C++. Did C do that also?

Also, silently wrapping around, twos complement style, and continuing is still often a wrong result and causes program failure.

-ftrapv sounds great. I guess that is not part of the C or C++ language specification but an option a particular compiler may or may not have.

Ada is different. You can define your own types with whatever range and it will not allow values to go out those ranges.
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 7:13 pm

Heater wrote:
Sat Dec 01, 2018 7:04 pm
Interesting.

That "defined" behavior is for C++. Did C do that also?
Not yet. C is more conservative. I read somewhere that the next C standard likely C2x will follow.
For GCC and probably other compilers, -fwrapv introduces it now:-

Code: Select all

-fwrapv
    This option instructs the compiler to assume that signed arithmetic overflow of
    addition, subtraction and multiplication wraps around using twos-complement
    representation.  This flag enables some optimizations and disables others. 
Tests like "if( n + 1 < n )" will not longer be removed by the compiler!!
Also, silently wrapping around, twos complement style, and continuing is still often a wrong result and causes program failure.
Yes of course. My point was just that twos-complement hardware is universal now.

Heater
Posts: 16549
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 7:23 pm

jahboater,

Your array addition, 32 bits at a time, is nice.

Of course on a 64 bit machine one would want to do additions in 64 bits. But that requires a 128 bit intermediate result. Which C does not support. Then what?

As always choice of algorithm depends on what one is trying to do. For example....

Say you have two files containing a billion decimal digits each. The task is to add those numbers together and produce a file with a billion digit (plus one maybe) result.

One could spend all day converting the numbers to binary, doing a fast addition, and then spending all day converting the binary back to decimal.

Or, just skip though it in decimal anyway.
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 6114
Joined: Wed Feb 04, 2015 6:38 pm
Location: West Dorset

Re: Why Avoid BASIC on RPi?

Sat Dec 01, 2018 7:32 pm

Heater wrote:
Sat Dec 01, 2018 7:23 pm
Of course on a 64 bit machine one would want to do additions in 64 bits. But that requires a 128 bit intermediate result. Which C does not support. Then what?
Well yes that is a problem. Non-portable "_int128" or nothing. Its not ideal. But 64-bit addition on a 32-bit machine is reasonably fast "adds; adc" on the Pi.
Heater wrote:
Sat Dec 01, 2018 7:23 pm
As always choice of algorithm depends on what one is trying to do. For example....
Yes indeed. I started off doing it with individual decimal digits like a human would, but it was very very slow.
It was crap at multiplication - it just did a repeated add (for very simple code). So addition had to be fast.

Return to “Off topic discussion”