Gerbertus
Posts: 3
Joined: Tue Apr 17, 2018 3:39 pm

3B+ BLAS performance etc

Tue Apr 17, 2018 4:47 pm

Short version: How is linear algebra performance on the Pi 3 B+? Would someone please post the full output of a Mathematica benchmark?

(to do that just pull up a terminal and do

Code: Select all

$ wolfram
In[1]:= Needs["Benchmarking`"]
In[2]:= Benchmark[]
note it'll need your full CPU for around 7 minutes )

Long version: I got a Pi 2, largely because hey, free copy of Mathematica, and have used it occasionally. When the Pi 3 came out, though many benchmarks showed a large performance increase over the Pi 2, other use cases didn't exhibit that, and I didn't end up upgrading. For instance, the Mathematica benchmark showed most of the tests only slightly faster and the linear algebra tests noticeably slower. Some other linear algebra tests showed slim improvements. (A couple tests with NEON showed larger gains.)

Some of the slim performance increases - or, in the case of Mathematica, the decrease - could perhaps be because the Pi 3's CPU really isn't much better for those workloads. But much of it may be because of poorly tuned BLAS. If it's the latter, maybe it may have been fixed since then? I'd like to see whether there's improvement running the latest versions with the 3B+.

Gerbertus
Posts: 3
Joined: Tue Apr 17, 2018 3:39 pm

Re: 3B+ BLAS performance etc

Tue Apr 17, 2018 8:29 pm

I'll note that I posted this to the general forum but a moderator moved it here. My question is largely about linear algebra performance in general. There are a vast variety of programs that would benefit from having a well tuned BLAS and not just Mathematica. And though I am asking for the Mathematica benchmark as a primary example, it's something anyone with a 3B+ could easily run, it's not as though one has to have any knowledge of Mathematica whatsoever to run the benchmark.

jardino
Posts: 114
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 3B+ BLAS performance etc

Wed Apr 18, 2018 2:54 pm

By coincidence, I've been benchmarking three of my RPis that were to hand - including my3B+. Attached are my results.
Alan.
Benchmarks.jpg
Benchmarks.jpg (34.02 KiB) Viewed 1456 times
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

ejolson
Posts: 1889
Joined: Tue Mar 18, 2014 11:47 am

Re: 3B+ BLAS performance etc

Wed Apr 18, 2018 5:23 pm

jardino wrote:
Wed Apr 18, 2018 2:54 pm
By coincidence, I've been benchmarking three of my RPis that were to hand - including my3B+.
Those are interesting results. I wonder why the 3B+ is slower than the 3B in test 5. Maybe the 3B+ started to throttle sooner. Did you check whether the 3B+ SOC experienced low power or thermal heating issues during the test?

Performance while throttled is interesting due to the large number of people who have not installed extra cooling or are using mobile phone chargers rather than the official power supply. However, people concerned with Mathematica performance should typically solve their throttling issues before proceeding.

As far as general performance goes, the BLAS libraries included with Raspbian used to be compiled for ARMv6 compatibly and consequently run about 10 times slower than a properly optimized ARMv7 NEON library. I'm not sure if this is still the case, but it may not matter because Mathematica likely uses its own BLAS libraries rather than the system ones. It would be interesting to know whether Mathematica ships multiple versions for ARMv6 and for ARMv7 NEON. If not, I wonder whether it is possible to get Mathematica to dynamically link with a 3rd party ARMv7 NEON optimized library.

jardino
Posts: 114
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 3B+ BLAS performance etc

Wed Apr 18, 2018 6:07 pm

I used a good official power supply for the tests, but did not check the machine's temperature.
Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

Gerbertus
Posts: 3
Joined: Tue Apr 17, 2018 3:39 pm

Re: 3B+ BLAS performance etc

Wed Apr 18, 2018 8:53 pm

jardino- Thanks! Do you have the raw data available? (That's the output of Benchmark[], with '"BenchmarkResult" -> n1, "TotalTime" -> n2, "Results" -> {{"Data Fitting", n3}, {"Digits of Pi", n4}, {"Discrete Fourier Transform", n5}' and so on with times for each.) Eyeballing the graph is made somewhat harder by the way the Pi Zero outliers shrink the relevant portion.

So the general matrix multiply is slower on the Pi 3+ than on the Pi 3, which in turn was slower than the Pi 2. One thing this tells me is that the BLAS is very badly tuned. It should be possible to get a huge performance increase in Mathematica - and in just about any other math/sci software - by getting an appropriate BLAS in there.

I found a post saying Mathematica for the Pi uses a reference cblas rather than any optimizations whatsoever. I read something else saying that's the normal way to install NumPy on the Pi too, as well as Octave. In all three cases, as ejolson says, a decent NEON BLAS should be ~10x as fast, but even a well optimized VFP version could be at least half as fast as the NEON and possibly up to 3/4 as fast, i.e. >7x as fast as now.

This can be the difference between scientific and numerical computing on the Pi being merely a toy and it being fast enough for real student use. And since I know there's well optimized code out there already, if the right people were involved this could be relatively simple to fix compared to other Pi performance problems.

jardino
Posts: 114
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 3B+ BLAS performance etc

Thu Apr 19, 2018 8:20 am

The raw data is in a spreadsheet. I'll get it to you later.
Alan.
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

jardino
Posts: 114
Joined: Wed Aug 08, 2012 9:03 am
Location: Aberdeenshire, Scotland

Re: 3B+ BLAS performance etc

Thu Apr 19, 2018 3:42 pm

Sorry, this system won't allow me to attach my simple spreadsheet file. I'll need to find a workaround.
In the meantime, here is the graph with the Pi zero results deleted, to allow better resolution of the other two devices.

Alan.
Benchmarks2.jpg
Benchmarks2.jpg (45.36 KiB) Viewed 1338 times
IT Background: Honeywell H2000 ... CA Naked Mini ... Sinclair QL ... WinTel ... Linux ... Raspberry Pi.

ejolson
Posts: 1889
Joined: Tue Mar 18, 2014 11:47 am

Re: 3B+ BLAS performance etc

Tue Apr 24, 2018 12:39 am

Gerbertus wrote:
Wed Apr 18, 2018 8:53 pm
This can be the difference between scientific and numerical computing on the Pi being merely a toy and it being fast enough for real student use.
I just ran the high-performance Linpack benchmark on the Pi 3B+ and found a 4.76 percent improvement compared to the original Pi 3B. Note that I pointed a hairdryer on the cold setting at the SBC during the benchmark to ensure thermal throttling did not occur. While it is possible further speed increases could be demonstrated with further tuning, I wouldn't expect performance to ever decrease when moving from the 3B to the 3B+.

Return to “Mathematica”

Who is online

Users browsing this forum: No registered users and 2 guests