jahboater
Posts: 5026
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 8:53 am

Heater wrote:
Sun Jan 20, 2019 8:38 am
Anyway -march=sandybridge makes no noticeable difference.
GCC's CPU detection on Intel has worked for a long time (unlike on ARM).
Probably because there is a decent CPUID instruction which gives exact and exhaustive details of every modern processor.
According to "man GCC" it even goes as far ahead as icelake - two generations away.

-march=sandybridge should give an identical executable to -march=native (compare them with "cmp" rather than benchmarks).

I like the Intel codenames.
I think all the names and places actually exist.

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 9:47 am

Intel's names are OK.

It's just that there is so many of them. It has taken me two or three weeks now to find out what this chip is!
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 10:55 am

Heater wrote:
Sun Jan 20, 2019 9:47 am
Intel's names are OK.

It's just that there is so many of them. It has taken me two or three weeks now to find out what this chip is!
With the -mtune and -march flags set appropriately does -O3 now produce faster executables than the -O2 setting?

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 11:27 am

Possibly, maybe...

Code: Select all

$ g++ -O2 -march=native -mtune=native -DUSE_OMP -fopenmp -std=c++17 -o fibo_karatomp  fibo_4784969/c++/fibo_karatsuba.cpp fibo_4784969/c++/fibo.cpp
$ time ./fibo_karatomp > /dev/null

real    0m0.244s
user    0m1.219s
sys     0m0.109s
$ g++ -O3 -march=native -mtune=native -DUSE_OMP -fopenmp -std=c++17 -o fibo_karatomp  fibo_4784969/c++/fibo_karatsuba.cpp fibo
_4784969/c++/fibo.cpp
$ time ./fibo_karatomp > /dev/null

real    0m0.245s
user    0m1.250s
sys     0m0.094s
At this point the reported times are jittering around all over. Hard to tell the signal from the noise.

I would have to make repeated timings and get an average. That is what I was supposed to be doing with Google benchmark...

I just noticed that letting the output dump to the screen rather than sending it to the bit bucket doubles the run time. At which point I wonder if there is any point in getting the computation to run faster?

Of course I'm using the C++ iostream to output the result with "<<". That is notoriously slow, something else to optimize...
Memory in C++ is a leaky abstraction .

jahboater
Posts: 5026
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 2:34 pm

Heater wrote:
Sun Jan 20, 2019 9:47 am
It has taken me two or three weeks now to find out what this chip is!
Try:

Code: Select all

gcc -march=native -Q --help=target | grep arch
also

Code: Select all

gcc -march=native -S -fverbose-asm fibo.c -o fibo.s
grep arch fibo.s
Last edited by jahboater on Sun Jan 20, 2019 2:44 pm, edited 1 time in total.

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 2:39 pm

Heater wrote:
Sun Jan 20, 2019 11:27 am
I just noticed that letting the output dump to the screen rather than sending it to the bit bucket doubles the run time. At which point I wonder if there is any point in getting the computation to run faster?
Scrolling a million digits through a GUI is likely to take some time. Sending the output to a file should take about the same time as to /dev/null. For me piping the output through head -c32 seems faster than tail -c32 by a little bit.

The timings in the graphs were obtained by first setting the CPU governor to performance and the minimum frequency the same as the maximum. Then, the best of 50 runs with output going to /dev/null were taken for each measurement. Each program was tested using from 1 to 8 processors, which means each program was run 400 times. For the single-core runs there was little variation between timings. For the multi-core runs variation was as much as 30 percent.

The rational behind taking the minimum time is that it is robust against the program being preempted by the operating system doing strange things. At the same time, the server was relatively quiet having just come up from an unexpected power outage. There is an additional nondeterminism present in the OpenMP algorithms used to schedule the worker threads. This variability comes not from the operating system doing unrelated things but is intrinsic to the computation itself. For this reason, it might also make sense to average the multi-core times and report an estimate of the variance. I didn't take averages because removing the variability due to the operating system doing unrelated tasks seemed more important in the current situation.

It sounds like Windows Subsystem for Linux introduces even more variability into program run times than just Linux. Among other things the default CPU governor on a laptop is likely to be sluggish about increasing clock speed in order to save power. I don't know if the Linux /sys filesystem settings can be used to change the behaviour of the Windows scheduler or whether a mouse and something like control panel is needed. After collecting the timings don't forget to turn power saving back on.
Last edited by ejolson on Tue Jan 22, 2019 9:02 am, edited 1 time in total.

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 3:08 pm

ejolson,
Scrolling a million digits through a GUI is likely to take some time.
That is what I surmise. I guess I'm a little bit surprised that it does not dump the entire output into buffers, including the final output of "time" and then the buffers get scrolled to the screen more leisurely pace. Similar to the buffering that goes on when writing files.

I have just this moment taken delivery of a very similar PC. I'll stick Debian on there and see how that compares with the LSW.
Memory in C++ is a leaky abstraction .

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 3:17 pm

ejolson,
I don't think async futures are any better for the kind of recursive parallelism being exploited in the Karatsuba algorithm...
TL;DR The following is my rambling on HPX parallel processing as I try to understand what they are up to with it....

I think I have realized what the HPX guys are talking about with their async, futures, and ultra-light weight threads and so on... It is a bit conceptually sideways and inside out compared to the way we (well me) normally think about these thing.

Firstly, let's consider the way we have paralleized the karatsuba and or fibo, which I gather is typical of such parllelization, not that I know anything much about such things:

We have a problem, call it "P". We have an algorithm to compute a solution to that problem, call it "A". So we just want to run that like so:

Code: Select all

result = A(P)
Our algorithm is recursive, it chops the problem P into parts, say two parts call them P' and P'', it calls itself to work on those problem subsets, it then has to combine the results of those sub-problems to arrive at the final result. Which in a hand waving, pseudo code, kind of way, ignoring all details of how things are actually split, recombined or even calculated at all, looks like:

Code: Select all

A (P) {
    Spilt P into parts P' and P''.
    result' = A(P')
    result'' = A(P'')
    result = combine (result' and result'')
    return result
}
This is nice, if we have the right recursive algorithm this already can be used to optimize all kind of things over the naive iterative approaches. Even on a single core machine. From Karatsuba to Fibo to Fourier Transform etc.

Now we notice we have two calls to A inside A and it looks like we can do those independent things in parallel on their own processors to speed things up a bit. So we introduce some kind of threading system to distribute those sub-problems around processors. Borrowing the PAR notation from Occam to indicate things that can be done in parallel we have:

Code: Select all

A (P) {
    Spilt P into parts P' and P''.
    PAR
        result' = A(P')
        result'' = A(P'')
    result = combine result' and result''
    return result
}
But, oops, this fails badly, we need to ensure that both our parallel parts have finished and have produced a result before we can combine the sub-results. We need wait for those results and get back to sequential operation. Like so:

Code: Select all

A (P) {
    Spilt P into parts P' and P''.
    PAR
        result' = A(P')
        result'' = A(P'')
    SEQ
        result = combine result' and result''
        return result
}
Brilliant, as ejolson has shown using OpenMP this can speed things up nicely when many cores are available. With OpenMP the PAR is "#pragma omp parallel.." and the SEQ is "#pragma omp taskwait". This is typical fork/join parallelism as used in OpenMP, Cilk, MPI, etc, I believe.

This has some problems though. It's not nice C or C++ syntax for a start. More seriously if result' can be computed quicker than result'' we are wasting valuable CPU time as we wait for result'' in the sequential phase.

Enter the HPX guys. They make use of standard C++ syntax to get their threading done. C++ has "async" to make parallel threads of execution and "futures" to get the result back and synchronize our sequential operation again. So it looks kind of like this:

Code: Select all

A (P) {
    Spilt P into parts P' and P''.
    future1 = async (A(P'))
    future2 = async (A(P''))
    result = combine furture1.get() and furture2.get()
    return result
}
That is nice. In C++ we get to use standard C++ syntax. But basically it gains us nothing in efficiency. It's essentially doing the same "fork/join" that we did before. Basically: start a couple of threads, wait for them to complete, do the rest of the work.

There is still the issue of that wasted CPU time as different sub-results complete at different rates and we have to wait for both of them. There is also the issue of the overheads of creating threads and task switching between them, this is very slow when using POSIX or Linux threads. Thread creation has to be minimized. Don't forget this is recursive so that wastage occurs at every level down our recursion tree.

But here is the HPX big idea...

1) Let's imagine that asyc() does not necessarily start any thread running at the moment it is called. It simply schedules a thread to be run some time. Perhaps putting on a queue to be run eventually by some core or other.

2) Let's change our code a tiny bit. Instead of having to wait on a couple of futures to get some actual results to combine let's have the combine operation produce a future of it's own. Let's have our algorithm return that future instead of an actual result. It will look something like:

Code: Select all

A (P) {
    Spilt P into parts P' and P''.
    future1 = async (A(P'))
    future2 = async (A(P''))
    resFuture = combine (furture1 and furture2)
    return resFuture
}
Oh wow, what do we have now?

That function A() is no longer actually computing anything. It simply creates some threads to do the work, which will be scheduled to run sometime. It returns a future which will eventually be used to get the result by the caller.

What we have done is to create a whole tree structure of work to do, possibly with millions of threads in it!

That's never going to work, threads are really slow... well except if one can reduce the cost of thread creation to something very small. Which the HPX guys claim to do. The threads are all user space packets of work to do, perhaps millions of them, except they are distributed over as many actual cores and "real" threads as you have.

Eventually all those little packets of work do get run, in whatever order, it does not matter at what level of recursion they were created at or where in that recursive tree they come from.

In this way, so they claim, all that time we wasted with our old fork/join at every level is no longer wasted, if a processor becomes idle work for it to do can be found.
...Note, however it is important to have a proper implementation of fork-join that upon sync only blocks for tasks spawned in the current scope.
Those HPX guys say otherwise. Looked at the way I try to describe above, by scheduling tiny threads with async and returning results that don't exist yet, in futures, then the actual synchronization required to resolve those futures need not be in the current scope at all. It could be far away back up the recursive call stack. You are returning a need for synchronization that your caller has to deal with, or the callers caller, etc....

Can any of this help with Karatsuba or Fibo? I have no idea. Does it work on a Raspberry Pi. But it looks like I will be playing with it, far into the future. Watch this space.

My off the cuff fibo() challenge has been leading me to all kind of things I never knew or imagined before...
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 6:46 pm

Heater wrote:
Sun Jan 20, 2019 3:17 pm
In this way, so they claim, all that time we wasted with our old fork/join at every level is no longer wasted, if a processor becomes idle work for it to do can be found.
The way task spawn and sync are currently implemented in both Cilk and OpenMP behind the scenes using work stealing also keeps all threads in the worker pool busy. There is no need to create the entire directed graph first. Instead, each worker has a separate queue of work. Then, only if needed, new work is taken from the queue of a different worker. This results in an algorithm that greedily minimises context switches while executing the same bits of parallel work that would have been represented by the edges of the graph.

Creating the whole graph ahead of time might allow a graph-theoretic optimal scheduling of the work, provided the edge weights were known ahead of time. The full graph might also allow for easier cancellation of part of a calculation based on already computed results. For the Karatsuba algorithm I don't see how it makes any difference. I'm not even sure it prevents cactus stacks from sprouting up all over the place. Speaking of which, the hypothetical problem of flooding electrical tunnels in the desert seems to be over and the graphs are back.
Last edited by ejolson on Sun Jan 20, 2019 7:31 pm, edited 1 time in total.

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 7:23 pm

Ah, but, I don't believe that HPX does create the entire directed graph of execution first, before it actually does any work. Even if I may have implied as such above. Rather it is spawning threads, growing graph, executing threads, which "prunes" the graph, all dynamically as it runs.

Either way, surely the observation is true, things like OpenMP require that all the forking and joining have to match up in any given scope, as you pointed out. Which is not the case for HPX. Which feels like it should be eliminating delays somewhere.

What would be a super simple algorithm that we could test this claim with and see how well they compare?

Mandlebrot comes to mind...

If you are in California, I hear you won't have any electricity soon, in tunnels or otherwise!
Memory in C++ is a leaky abstraction .

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 7:35 pm

There is one thing I think I can be sure of:

I wrote an FFT for the Parallax Propeller multi-core micro-controller. To my amazement I could get a usable speed up when using OpenMP to spread that over two or four cores. Not anything like linear but useful.

Given the Propeller only has 32K Bytes of RAM that is impressive. I don't believe that will be possible with HPX.
Last edited by Heater on Sun Jan 20, 2019 8:26 pm, edited 1 time in total.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Sun Jan 20, 2019 7:52 pm

Heater wrote:
Sun Jan 20, 2019 7:23 pm
Ah, but, I don't believe that HPX does create the entire directed graph of execution first, before it actually does any work. Even if I may have implied as such above. Rather it is spawning threads, growing graph, executing threads, which "prunes" the graph, all dynamically as it runs.

Either way, surely the observation is true, things like OpenMP require that all the forking and joining have to match up in any given scope, as you pointed out. Which is not the case for HPX. Which feels like it should be eliminating delays somewhere.

What would be a super simple algorithm that we could test this claim with and see how well they compare?

Mandlebrot comes to mind...

If you are in California, I hear you won't have any electricity soon, in tunnels or otherwise!
There is a conquer and divide algorithm for computing Mandelbrot sets based on connectivity which can be parallelized the same way as Karatsuba. It would be nice to see a natural parallel processing example where async futures are required to cross function boundaries.

I'm not sure what's going on in California and a web search didn't help me.

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Mon Jan 21, 2019 4:51 am

ejolson,
I'm not sure what's going on in California and a web search didn't help me.
I'm not a great one for following the news but as far as I can tell large parts of California have been on fire all last summer, every one is blaming it on the power company, Pacific Gas and Electric Company, has declared bankruptcy in the face of billions of dollars of claims for damages.
https://www.washingtonpost.com/technolo ... 87fb20e532
Memory in C++ is a leaky abstraction .

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Mon Jan 21, 2019 6:15 am

ejolson,

Oh boy, Mariani-Silver algorithm...

When us young guys came across the Mandlebrot set and all things fractally in the early 1980's productivity of the whole software project team dropped to zero for a month as we got absorbed in writing our own codes for it. Fractals were all the rage for some time.

I'm afraid to start looking at Mandlebrot again...
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Tue Jan 22, 2019 1:16 am

ejolson wrote:
Fri Jan 18, 2019 8:48 am
ejolson wrote:
Tue Jan 15, 2019 8:06 pm
Heater wrote:
Tue Jan 15, 2019 7:56 pm
This BASIC thing has really gone to your head!
I'm just trying to stay off topic in this off-topic thread. Along those lines, it would be nice if a RISC OS BBC BASIC code (with or without inline assembler) were to appear.
I have finished writing visual.bas, a version of the Fibonacci code in Visual Basic. This program is based on the FreeBASIC fibo.bas program with syntax modifications so it will compile and the addition of the missing Karatsuba algorithm.
I converted the Visual Basic code visual.bas back to FreeBASIC to obtain a version in FreeBASIC with the Karatsuba algorithm. In a way this worked; however, only for small values of n. When n=4784969 the stack overflows with a resulting segmentation fault. I've tried the -t compiler option to increase stack size as well as a direct call to setrlimit and both. Neither allowed computation the 4784969th Fibonacci number. In addition to the broken gcc back end in which non-volatile variables get clobbered, this appears another reason to avoid FreeBASIC.

I'm still hoping that someone would run visual.bas using the Microsoft tools for comparison. Trying mono under x86 would also be interesting.

From an expressivity point of view, it is worth mentioning that the TPL Task Parallel Library allows a programmer to express a recursive parallel Karatsuba multiply using Visual BASIC in much the same way as with OpenMP. When I have time, I'll try to make the modifications to see how efficiently the resulting code conveys the parallel program to multi-core CPUs.
Last edited by ejolson on Tue Jan 22, 2019 5:04 am, edited 1 time in total.

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Tue Jan 22, 2019 1:27 am

Heater wrote:
Sat Jan 19, 2019 8:45 pm
ejolson ,
Before doing anything in user space you need to employ some operating system feature to actually start the threads. It further appears the pool of worker threads is blocking instead of spinning when there is no work to do. This also requires interaction with the kernel.
Thinking about it, I suspect you are right and start to wonder what the guy was talking about.

Looking forward to your updated results.
The results obtained from running parallel.c on a Xeon server have been updated to include the micro-optimization of over provisioning when multiple cores are available to perform the calculation. Over provisioning results in a 4% performance increase when 7 and 8 cores are used, essentially no change when running on 4 to 6 cores and a negative impact when using 2 or 3 cores. The graphs have been updated to reflect the best possible timings in each case.

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Wed Jan 23, 2019 5:45 am

ejolson wrote:
Tue Jan 22, 2019 1:27 am
Heater wrote:
Sat Jan 19, 2019 8:45 pm
ejolson ,
Before doing anything in user space you need to employ some operating system feature to actually start the threads. It further appears the pool of worker threads is blocking instead of spinning when there is no work to do. This also requires interaction with the kernel.
Thinking about it, I suspect you are right and start to wonder what the guy was talking about.

Looking forward to your updated results.
The results obtained from running parallel.c on a Xeon server have been updated to include the micro-optimization of over provisioning when multiple cores are available to perform the calculation. Over provisioning results in a 4% performance increase when 7 and 8 cores are used, essentially no change when running on 4 to 6 cores and a negative impact when using 2 or 3 cores. The graphs have been updated to reflect the best possible timings in each case.
Here are results which compare the scaling and overall performance of the parallel C and C++ codes when running on an eight-core Cortex-A53 SBC, namely the NanoPC-T3. The programs were compiled using the commands

Code: Select all

/usr/local/gcc-6.5/bin/gcc -O3 -march=native -mtune=native -fopenmp -o parallel fibo_4784969/c/parallel.c -lm
/usr/local/gcc-6.5/bin/g++ -O3 -march=native -mtune=native -DUSE_OMP -fopenmp -std=c++17 -o fibo_karatomp \
    fibo_4784969/c++/fibo_karatsuba.cpp fibo_4784969/c++/fibo.cpp -lm
/usr/local/gcc-6.5/bin/g++ -O3 -march=native -mtune=native -DUSE_ASYNC -std=c++17 -o fibo_karatasync \
    fibo_4784969/c++/fibo_karatsuba.cpp fibo_4784969/c++/fibo.cpp -lpthread -lm
and measurements taken the same way as for the Xeon server. The graph showing how the computation scales with number of cores is

Image

and the overall speed of the respective programs is given by

Image

Note that parallel.c is uniformly faster than fibo_karatsuba.cpp for all measurements. This is notably different compared to running the programs on the Xeon server where the fibo_karatsuba.cpp code was faster. The reason for this difference in relative performance between the C and C++ codes has been conjectured to result from the out-of-order speculative execution optimizations in the Intel architecture which allow loops with if statements to run nearly as fast as vectorizable arithmetic kernels.

A Cortex-A53 based SOC is also used in the Pi 3B and 3B+. Therefore, these models of Pi when running in 64-bit mode should produce results similar to the first half of the above graphs. In further light of the Pi Zero results, parallel.c is expected to be faster than fibo_karatsuba.cpp on all Pi models.

I wonder whether there is any reason to avoid a parallel version written in Visual Basic?

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Wed Jan 23, 2019 10:52 pm

Here is one more comparison of the C and C++ code, this time using an eight-core Ryzen processor. The compilation commands were

Code: Select all

/usr/local/gcc-8.2/bin/gcc -O3 -march=native -mtune=native -fopenmp -o parallel fibo_4784969/c/parallel.c -lm
/usr/local/gcc-8.2/bin/g++ -O3 -march=native -mtune=native -DUSE_OMP -fopenmp -std=c++17 -o fibo_karatomp \
    fibo_4784969/c++/fibo_karatsuba.cpp fibo_4784969/c++/fibo.cpp -lm
/usr/local/gcc-8.2/bin/g++ -O3 -march=native -mtune=native -DUSE_ASYNC -std=c++17 -o fibo_karatasync \
    fibo_4784969/c++/fibo_karatsuba.cpp fibo_4784969/c++/fibo.cpp -lpthread -lm
and the resulting graphs are

Image

and

Image

These results show that fibo_karatsuba.cpp is faster than parallel.c on the Ryzen processor. This is similar to what was obtained with Xeon and opposite to what happens with Raspberry Pi.

While BASIC appears slower for computing Fibonacci numbers, one should remember that although the A stands for all-purpose, the B in BASIC stands for beginner. To put things in perspective, the alternative to learning BASIC in primary school appears to be Scratch. Some ten-year-olds might be interested in high-performance computing and million-digit Fibonacci numbers; however, it is likely others are interested in writing games. Since BBC BASIC and Scratch both consist of self-contained programming environments that include built-in graphics and sound capabilities, the question why avoid BASIC surely means something different than how efficiently can the language be used to convey complicated algorithms to the CPU. In context, then, the question now becomes, if Scratch is the alternative, why avoid BASIC on the Raspberry Pi?

Maybe it is time to begin the challenge of which programming language is best suited for creating a graphical networked version of the classic Star Trader game.

User avatar
Gavinmc42
Posts: 4292
Joined: Wed Aug 28, 2013 3:31 am

Re: Why Avoid BASIC on RPi?

Thu Jan 24, 2019 10:24 am

I did a search for compilers, Intel and AMD make them too.
Would they be better optimized using insider tricks?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

jahboater
Posts: 5026
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 24, 2019 11:04 am

Gavinmc42 wrote:
Thu Jan 24, 2019 10:24 am
I did a search for compilers, Intel and AMD make them too.
Would they be better optimized using insider tricks?
I think armcc is based on llvm.

ICC is highly regarded, but beware poor performance on AMD cpu's.

In general, GCC is a good choice, IMHO.

You can easily try, and compare, different compilers with:
https://godbolt.org/
The code produced for the opening example (count_t_letters) is amusing!

User avatar
Gavinmc42
Posts: 4292
Joined: Wed Aug 28, 2013 3:31 am

Re: Why Avoid BASIC on RPi?

Thu Jan 24, 2019 11:26 am

You can easily try, and compare, different compilers with:
https://godbolt.org/
Mostly GCC or clang versions, no Intel, AMD but there is Arm/Arm64 gcc, is that the same as ARM's own?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

jahboater
Posts: 5026
Joined: Wed Feb 04, 2015 6:38 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 24, 2019 11:44 am

Gavinmc42 wrote:
Thu Jan 24, 2019 11:26 am
no Intel,
Yes there is! Its called ICC.
Gavinmc42 wrote:
Thu Jan 24, 2019 11:26 am
there is Arm/Arm64 gcc, is that the same as ARM's own?
No, I believe armcc is based on llvm, so I presume its much the same as clang.

I see there is MSVC now.

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 24, 2019 12:21 pm

I was watching some Youtube vid the other day, a CPPCON presentation or some such, and the Intel compiler was not doing very well up against GCC and Clang for the stuff they were throwing at it. Sadly I don't recall which vid that was or what exactly it was they were doing. It was all about writing optimal C++.

Anyway, being closed source and Intel only I'm not inclined to start looking into it.
Memory in C++ is a leaky abstraction .

Heater
Posts: 14429
Joined: Tue Jul 17, 2012 3:02 pm

Re: Why Avoid BASIC on RPi?

Thu Jan 24, 2019 6:45 pm

Whilst I was busy avoiding BASIC I accidentally installed Win10 on that new (old) PC I mentioned earlier. Well, it has a Win 7 Pro license so why not make use of the upgrade.

It's not quite the same as this one, a bit older i5 at 3.3GHz and only 4 cores, 4 hyper threads. (This is an i7, 3.4Hz, 4 core, 8 thread). Interesting fibo_karatsuba runs at almost exactly the same speed with one core. It runs at almost exactly the same speed with 2 cores as this machine with 4 using OpenMP.

At some point I'll install Debian on it as well. It's a bit of an odd feeling contemplating dual booting again. I have not done that since I deleted Windows 98 from my machines last century.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 4247
Joined: Tue Mar 18, 2014 11:47 am

Re: Why Avoid BASIC on RPi?

Fri Jan 25, 2019 1:07 am

Heater wrote:
Thu Jan 24, 2019 6:45 pm
Whilst I was busy avoiding BASIC I accidentally installed Win10 on that new (old) PC I mentioned earlier. Well, it has a Win 7 Pro license so why not make use of the upgrade.

It's not quite the same as this one, a bit older i5 at 3.3GHz and only 4 cores, 4 hyper threads. (This is an i7, 3.4Hz, 4 core, 8 thread). Interesting fibo_karatsuba runs at almost exactly the same speed with one core. It runs at almost exactly the same speed with 2 cores as this machine with 4 using OpenMP.

At some point I'll install Debian on it as well. It's a bit of an odd feeling contemplating dual booting again. I have not done that since I deleted Windows 98 from my machines last century.
While avoiding BASIC may be good, I think the reason most people avoid the Intel C compiler is because gcc is good. Another reason to avoid icc may be the warning
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
which appears on many pages of the documentation. In particular, the Intel C compiler has been observed to select suspiciously non-optimal code paths when running on non-Intel hardware. Price may also be a concern for some people.

It is my recollection that Microsoft at one time licensed compiler-optimization technology from Intel to incorporate into Visual C++. As a result, it would be interesting to measure the performance of fibo_karatsuba.cpp compiled with the Microsoft tools to a gcc-compiled version. For comparison purposes, both should be run as native applications to avoid any emulation layers such as Wine or Linux Subsystem for Windows.

Return to “Off topic discussion”