User avatar
jojopi
Posts: 3085
Joined: Tue Oct 11, 2011 8:38 pm

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 2:03 pm

gmp-chudnovsky takes under 21s for a million digits on a Pi, plus another 10s to print the results.

y-cruncher, which Bakul Shah also mentioned previously, uses the Chudnovsky formula as well, with an implementation optimized for record-setting. I think it provides the best illustration we have of how challenging it would be to use the Pi's GPU to approximate pi, even with a suitable API.

Because, y-cruncher holds the world record for pi approximation, at ten trillion digits, and required only the CPUs of a single workstation to achieve this. There is little benefit from throwing a supercomputer or GPU or cluster at the problem. All of the known algorithms for approximating pi are fundamentally serial or iterative.

The modest parallelism that y-cruncher achieves is solely in its smart implementation. Furthermore, the program uses only 30% floating point, and its performance at large sizes becomes limited by memory bandwidth and even disk throughput.

We have already seen in this thread that at a million digits, the choice of algorithm makes the difference between hours or seconds of work. So you cannot choose an algorithm just because it is parallelizable (even if any of them were). And if you did, you would then have N times the requirement for memory bandwidth to contend with as well. Even the simplest algorithms require the best part of a megabyte of local storage for a million digits, which they write to constantly.

Perhaps someone will attempt general computation on the Pi's GPU, and that would be cool. But I advise them to pick a problem other than approximating pi, because putting in so much effort just be easily out-performed on the ARM would be unfortunate.

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 2:21 pm

So when Jim Manley wrote:
This once again points up how most people just don't get how they should be using the Pi - it's the GPU, stoopid!
he was ... wrong?

:o

That's a relief - I thought I was 'stoopid'.

Heater
Posts: 13360
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 2:27 pm

He was wrong.

What that says about us being ''stoopid' is not so clear though:)

P.S. Wrong that is until he or someone else proves otherwise by investing an enormous amount of time and effort into making use of the GPU to calculate pi. Even then I don't see any major (order of magnitude) gains to be made.
Memory in C++ is a leaky abstraction .

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 2:35 pm

I guess there must be no sure fire way for me to tell if I am stoopid then :shock:

I thought Jim might be right with regards to using the GPU for other non graphics tasks such as in combination with an Inertial Measurement Unit wher euler manipulation is required - but I agree that the only way to tell for sure would be to try it....

Heater
Posts: 13360
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 3:17 pm

I suspect there must be uses for the GPU outside of getting stuff onto the screen. If I understand correctly they should be useful for matrix multiplication, for example.

There is still a question mark over how fast you can get data in / results out though and if that cancels any gains to be had from the parallel maths engine itself.
Memory in C++ is a leaky abstraction .

User avatar
Jim Manley
Posts: 1600
Joined: Thu Feb 23, 2012 8:41 pm
Location: SillyCon Valley, California, and Powell, Wyoming, USA, plus The Universe
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 5:12 pm

OtherCrashOverride wrote:
In this game, calculating pi, more work on the maths is likely to give better results than more hardware and clever programming.
The VideoCore 4 (aka GPU) is clocked at 250Mhz. Unless a vectorized algorithm can be devised that meets the special conditions that yield the theoretical 24 GFLOPS, the GPU (whether integer or floating point) is only going to slow you down. This has long been know on other armv6+gpu platforms which is why it was not uncommon to do the opposite of the trend: move *away* from GPU computations back to CPU.
The clock speed of the GPU is way more than offset by the multiple pipelines on a strictly number-of-cycles-per-time basis. The issue with the Pi (the constant) digit calculation process appears to be that it can't benefit from parallelization (on anything, not just the Pi) as you have to calculate the more significant digits in order to calculate the less significant digits. The post stating "required only the CPUs of a single workstation" would seem to be misleading as the problem is fundamentally serial bound. It's not clear what "modest parallelization" is - it's kinda like pregnancy - either it is or it ain't. It seems we all need to be careful about making things plural where they can't be.

I'm puzzled by the "theoretical 24 GFLOPS" comment as I've never seen that figure challenged. Is it a problem of sustainability, e.g., it can only be performed for one or two clock cycles and then stalls due to delays getting results pushed out? I would think something that problematic would have been amply discounted within an hour of it first being stated. Another BCM2835 VideoCore IV spec is that it can generate 40 million shaded polygons per second, but that shouldn't be misinterpreted to mean one can generate 40 million polygons in a given frame that takes a second of computation to produce. That kind of thinking leads to the tired old clock speed comparisons of days gone by where no other parameter was considered (speed of bus, memory, I/O, caches, number and types of registers, word size, etc.).

I still stand by my statement that most people aren't even thinking about how the Pi's GPU can be used to advantage for tasks beyond graphics, but since others have been allowed to refine their positions, I'm going to do that too, and emphasize "where appropriate to the task". I've been focused on widening awareness of the possibilities of parallelism and wasn't thinking specifically about the calculating the digits of Pi problem - all one sees are comparisons of this CPU and that CPU, as if that were the only criterion and resource available. I was thinking more about techniques involving things like Taylor series where terms are independent and can be readily calculated and added/subtracted in parallel. Parallelism is even more appropriate for asymmetric problems that are compute-bound and don't require as much I/O outside the GPU or similar tightly-coupled processing elements (push in some non-massive amount of data, do a lot of processing, and push out succinct results). DoD did a study on the ability of software developers to be effective using parallel processing and found that only about one in three could make the mental leap necessary to do so, and that would explain the myopia and prejudice displayed here by some.

Calculating the digits of Pi itself doesn't readily translate to solving most other computing problems and there are many more problems that don't lend themselves to such simple representations (a string of digits - I do wonder how these digits are verified as about half of all peer-reviewed, published mathematical proofs have later been shown to have fatal flaws). Since most physical phenomena can't be consistently measured beyond about five digits of accuracy without severely declining payback for the increased effort, I have no personal need for the digits beyond 3.14159 ... I'm not saying the pursuit is worthless and that others shouldn't make the effort, it's just not something that consumes much of my brainwidth. There are other problems that I need to solve that can benefit from parallelism in the Pi at a low cost since I'm one of the few that seems to be trying to use the Pi, warts and all, to improve K-12 STEM education as it was developed to do.

I don't think anyone will disagree that my posts haven't stimulated some useful discussion that brought to light things most of us didn't know. If the price of that is clear hatred and jealousy by pygmies (make up your mind, you can't also be a giant, which we note comes second in your handle for reasons that are now obvious) and others who hide behind anonymous, unimaginative handles, I will gladly bear that mantle in lieu of others having to do so. I have always been one of the few who raises their hand and asks the questions others really want to ask, but are too afraid of being ridiculed by those with smaller minds. From what some have posted, you would think they were personally and solely responsible for the advances in Pi digit calculation. Go ahead, let the hate consume you, it only makes the rest of us stronger.

Regarding Wayland, I'm going to refine that position and say that it has the potential to support development of more responsive and attractive GUI elements that could resemble the current OS X GUI elements and their behaviors due to GPU acceleration. I'm not sure why the Wayland screenshots show what appear to be fairly blocky, low-res elements and typefaces (a font family only describes the basic outlines and features of a typeface, not what is ultimately displayed/printed, which includes typeface size, line weight/thickness, italics, bold, etc.). I don't know whether existing freeware elements can be readily imported/converted for use in Wayland if needed (e.g., X typefaces) , but it would seem to be a very helpful possibility.
The best things in life aren't things ... but, a Pi comes pretty darned close! :D
"Education is not the filling of a pail, but the lighting of a fire." -- W.B. Yeats
In theory, theory & practice are the same - in practice, they aren't!!!

Heater
Posts: 13360
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 7:21 pm

Jim,

For an ex-military guy who is used to the rough and tumble of the barracks you are a very touchy, sensitive, if not paranoid guy. That is to say there is no "hate" here only a discussion about possibilities.

When you say, "I still stand by my statement that most people aren't even thinking about how the Pi's GPU can be used to advantage for tasks beyond graphics," I suspect you are correct. I also suspect that there are good reasons for that, and not simply that people are to 'stoopid' to consider it:

a) The Pis GPU is a closed, undocumented, chunk of hardware that is probably unique to the Broadcom range of ARM Socs. As such it does not encourage anyone to tackle using it as a super parallel maths engine. The results of their efforts would not be generally useful.

b) It has been suggested that advantage can be taken of the GPU even without knowing about its internals. That is by using the OpenGL ES API. This may well be true, but then in my quick Googling around of have not seen anyone anywhere doing that for any platform with OpenGL. ES or otherwise. I would suspect that if it made sense it would have been attempted already.

c) As we have seen argued it is not clear that the benefits of all those parallel multipliers and accumulators actually outweigh the overheads of getting data in and out of the thing.

d) It's in the nature of many problems that breaking them down into parallel operations is hard. If you have ever tried to recast your serial Fast Fourier Transform algorithm into something that can be run in parallel chunks by OpenMP, for example, then you know what a chore it is.

Bottom line, Document that GPU. Let the hard core geeks at it. Perhaps they will come up with surprises like they did back in the early days of 8 and 16 bit machines when demos were churned out that were "clearly impossible".
Memory in C++ is a leaky abstraction .

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 8:01 pm

... in my quick Googling around of have not seen anyone anywhere doing that for any platform with OpenGL
In this thread http://www.raspberrypi.org/phpBB3/viewt ... a&start=75 mikerr wrote:
Apple Mac people can play with this sample code that does an N-body simulation using OpenCL, running on the CPU, GPU, or both.
You can switch between them in real time, and the FPS count is displayed onscreen.
http://developer.apple.com/library/mac/ ... Intro.html
But I think his link is broken.

His example uses OpenCL to access the GPU which provides parallel computing using task-based and data-based parallelism rather than Open GL which is specifically for graphics.

I hope someone cleverer than me cracks the GPU nut. Some hardware like the Ivansense 6050 have been hacked by reverse engineering the api, but as this is just a microcontroller I suspect doing this with the Videocore would be several orders of magnitude harder or impossible. It seems a pity that neither the Foundation nor Boardcom feels able to help.

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 8:56 pm

Just for a laugh, I tried Java using its BigDecimal class. The good news: it was very easy to program and it got the answer right. The bad news: it was incredibly slow.

Code: Select all

Dec		C		Java
1000	1s		1.728s	
10^4	2s		9.7m	x336
10^5	128s		9.8d*	x1454
10^6	3.2h* x90	77.5y*	x2888
C is my original program running in 64 bit mode on my 2.5GHz Windows / Intel laptop.
Java is the new program running on the same machine.

In case you can't guess, s is seconds, m is minutes, h is hours, and y is years. * indicates that the program did not complete and this is the last estimate. I know that for my program, the estimates are accurate. I don't know whether this applies to the Java. I was simply scaling up the times so far to the required number of terms. For the duration of the partial runs, this estimate was reasonably consistent.

So, my decades old C program using a nearly 40 year old school boy algorithm needs 3.2 hours but the Java needs 77.5 years. To be fair, the estimate was dropping slightly but it was never going to catch the C program. What the heck is going on, why is so bad? It is much more complete and general purpose than my code which does what my pi algorithm needs and no more but these benefits should not cost so much.

The x figures indicate how much the program has slowed compared to the previous case. Each case is 10 times as many digits as the previous one. I would expect mine to be quadratic and hence slow by a factor of 100 for each step. It has done a bit better than that. On the other hand, the Java is slowing a much larger amount.

The Java should run on the Pi with no problem but considering these times, I am not going to bother trying.

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 9:38 pm

Jim Manley wrote: <snip>
The issue with the Pi (the constant) digit calculation process appears to be that it can't benefit from parallelization (on anything, not just the Pi) as you have to calculate the more significant digits in order to calculate the less significant digits. The post stating "required only the CPUs of a single workstation" would seem to be misleading as the problem is fundamentally serial bound. It's not clear what "modest parallelization" is - it's kinda like pregnancy - either it is or it ain't. It seems we all need to be careful about making things plural where they can't be.
There are degrees of parallelism. When I first got my hands on machines which could really run multiple threads at once, I considered whether they would help my pi program. I spent some time thinking about it. I decided that I could probably overlap some operations and get a slight benefit but I would have probably would not have kept even two CPUs busy so I decided that it was not worth the effort.

This problem applies to most but not all pi algorithms. Have a look at Spigot algorithms here: http://en.wikipedia.org/wiki/Pi. These incredible algorithms can calculate an arbitrary hexadecimal digit without relying on the calculations of all previous ones. Now, this would allow massive parallelism. You could throw any number of machines at the job doing just one hex digit each. Of course, at the end you have pi in hex but the game is pi in decimal. No problem, each machine can convert its own hex digit to decimal. As they complete, they can start to pair up and add their digits and bubble up the final result. This last phase is a mere logarithmic algorithm.
Jim Manley wrote: <snip>
Calculating the digits of Pi itself doesn't readily translate to solving most other computing problems and there are many more problems that don't lend themselves to such simple representations (a string of digits - I do wonder how these digits are verified as about half of all peer-reviewed, published mathematical proofs have later been shown to have fatal flaws). Since most physical phenomena can't be consistently measured beyond about five digits of accuracy without severely declining payback for the increased effort, I have no personal need for the digits beyond 3.14159 ... I'm not saying the pursuit is worthless and that others shouldn't make the effort, it's just not something that consumes much of my brainwidth.
That Wikipedia article on spigot algorithms also addresses verifying large pi calculations. Convert the claimed value to hex. This is a substantial task but small compared to actually calculating it and the conversion is amenable to parallelization. After that, you can verify hex digits at various points of the claimed value.

Of course, there is little use for the value of pi beyond a handful of places. It isn't actually a special hobby of mine (since school days anyway) but I can understand why some like to do it. I was originally a pure mathematician. Problems like this are interesting in their own right to pure mathematicians. Asking a pure mathematician what is the use of their work is almost insulting. Along the lines of: "Hey, Leonardo, a pretty picture but what use is it?". However, it is remarkably common that maths that was originally regarded as totally useless becomes useful much later. A good example is complex numbers. Crazy abstract nonsense for a long time but now essential to physics. I don't think that high accuracy calculations of pi have got to that stage yet but it could happen. Have a read of Carl Sagan's Contact, if you haven't already.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 23688
Joined: Sat Jul 30, 2011 7:41 pm

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 9:47 pm

OtherCrashOverride wrote:
In this game, calculating pi, more work on the maths is likely to give better results than more hardware and clever programming.
*Spoiler Alert*
The VideoCore 4 (aka GPU) is clocked at 250Mhz. Unless a vectorized algorithm can be devised that meets the special conditions that yield the theoretical 24 GFLOPS, the GPU (whether integer or floating point) is only going to slow you down. This has long been know on other armv6+gpu platforms which is why it was not uncommon to do the opposite of the trend: move *away* from GPU computations back to CPU.
There are two 16 way SIMD processors in the GPU. So each has a integer performance of approximately 16*250 = 4Ghz., but at a much lower power requirement than the equivalent 4Ghz Arm or Intel device. There are also some Quad processors used in the 3D core. I think there might be 12 of those. They are very unpleasant to program and I know very little about them. Ask Eben - he designed them I beleive !

But you cannot use any of the them officially (there is now a third party disassembler and assembler I believe for the VPU's)
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed. Here's an example...
"My grief counseller just died, luckily, he was so good, I didn't care."

Bakul Shah
Posts: 320
Joined: Sun Sep 25, 2011 1:25 am

Re: Something the Pi is not good at - calculating Pi

Mon May 20, 2013 10:38 pm

There are several intermingled threads.
  • We have already established the title of this discussion thread is false. Probably a good thing in hindsight :-)
  • There are several algorithms that are much faster at computing pi (AGM, Chudnovksy etc.) than the old "schoolboy method".
  • An integral component of the faster algorithms is the Schönhage-Strassen's FFT based multiplication algorithm with time complexity of O(n log n log log n). Long multiplication is O(n^2). Karatsuba's algorithm is O(n^1.5585). etc. See the wikipedia page and this paper on GMP implementation of SSA
  • Jim says (without any proof but with lots of enthusiasm) that GPU is the way to go. But it is not at all clear this is the case for computing pi. On an old thread Dom said the CPU->GPU bandwidth is about 100MB/s. That is just too puny a pipe to keep 24Gflops hardware busy -- except for some very special cases. Even Nvidia says that GPUs are better when there is high arithmetic intensity (GPU operations/words fetched). On another thread someone reported read 640x480 pixels @ 12 fps (amounts about under 15MBps) using glReadPixels(). This is decent but not great. Papers on GPGPU indicate good speedups for small datasets. The moment a dataset doesn't fit in the GPU cache, performance drops through the floor.
Basically Jim is comparing vaporware with existing algorithm and in such cases vaporware always wins! So I pose a challenge to Jim or anyone else: implement the Schönhage-Strassen multiplication algorithm on the GPU for large numbers (>64Kbits) and show it is twice as fast as the CPU. This is a generally useful operation so you can't whine about worthless pursuit. The code must be open sourced -- help from Broadcom would be nice but you can always use a GLSL shading program and retrieve the results using glReadPixles()! You have one year. Your real prize is serious bragging rights but if you want I will donate a token gift of $100 (or 3 Raspi-B boards) to whomever you (the first winner) choose! So Jim, the onus of proof on "It's the GPU, stoopid!" is on you!

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 8:38 am

Bakul,

A nice summary but I am slightly puzzled by one point.
Bakul Shah wrote: We have already established the title of this discussion thread is false. Probably a good thing in hindsight :-)
Do you mean that the Pi is a good platform for calculating pi? I would say that the Pi is an acceptable platform for the job if you have a decent algorithm. However, if you have something else that is faster and also easy to use (e.g. a typical laptop) then you may as well use that. I wouldn't bother with super clever programming, e.g. attempting to use the GPU, since, in my case at least, the same effort applied to the maths would give better results.

Heater
Posts: 13360
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 8:51 am

What is the criteria for "good" platform for calculating Pi.

My PC can get a million digits in 2 seconds. The Pi perhaps in 20 seconds. You might say the PC is better on the grounds of sheer speed.

BUT:

1) What about power consumption? I leave it as an exercise for the reader to calculate the relative power consumption of the PC and Pi for completing this task.

3) What about size? Again I leave it to the reader to compare the volumes of a big box PC or laptop vs the diminutive Pi. It might be I need to calculate Pi in a small space somewhere.

4) What about weight? Yet another reader exercise. It might be I need a million digits of Pi in a weather balloon some time.

5) What about noise? My PC whines and rattles like hell. My Laptop fan fires up like a jet engine if given that kind of job to do over time. The Pi...I can hear the crickets whist its working.

6) What about cost? If I had no computer and no way to download a million digits of pi the the Pi is an excellently cost effective way to get the result I want.

Sounds like the Pi is an excellent Pi engine.

In fact looking at those criteria we might conclude that the Pi is a superior compute engine in every way for pretty much anything compared to a PC. Provided it has sufficient speed for the job at hand.
Memory in C++ is a leaky abstraction .

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 9:11 am

Heater,

I agree with pretty much all of that. For many tasks, the low cost, low noise, low heat, small size and (most of all) low power consumption make the Pi very attractive.

In my previous post, I called it acceptable for calculating pi. As you say, does it really matter if the Pi takes 20s rather than the 2s of a PC. But, if we want pi to a billion places or some other big calculation then the picture could change. jamesh and DeeJay mentioned the Pi super-computers earlier. Cost effective as teaching platforms but not for huge number crunching.

It all depends on the exact rules of the game: sometimes the Pi will win and sometimes it won't.

If I was in a pi calculation challenge and the rules were that we must use a particular algorithm then I would want the fastest machine that I could get my hands on. However, if the algorithm was not specified then I would take my chances on selecting a better algorithm and worry less about the hardware. If the job had to be done on a limited budget then I would be happy with the Pi.

User avatar
Jim Manley
Posts: 1600
Joined: Thu Feb 23, 2012 8:41 pm
Location: SillyCon Valley, California, and Powell, Wyoming, USA, plus The Universe
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 9:27 am

jwlawler wrote:Do you mean that the Pi is a good platform for calculating pi? I would say that the Pi is an acceptable platform for the job if you have a decent algorithm. However, if you have something else that is faster and also easy to use (e.g. a typical laptop) then you may as well use that. I wouldn't bother with super clever programming, e.g. attempting to use the GPU, since, in my case at least, the same effort applied to the maths would give better results.
How do you know that if an alternative hasn't even be specified, much less analyzed or tested? We've already established that about 30% of the computation using the best algorithm can be accomplished most efficiently in FP hardware, and roughly half of the pipelines in the GPU provide that, with the other half providing a non-trivial amount of integer-handling power. No one has even looked at how it might be possible to arrange the problem where GPU hardware can be ganged to provide wide-word processing (12 x 53 bits of FP mantissa and 12 x 32 bits of integer, although jamesh's numbers may modify those somewhat).

Unless you mean upgrading your code to reflect the world record algorithm, how can you compare the effort needed to improve the math with what a GPU design and implementation would require when you don't even know what the effort for either will be? We haven't even started talking about the benefits the GPU may provide if the aforementioned disassembler and related assembler (IIRC) for the VideoCore IV can provide any improvements over the existing APIs provided by Broadcom. My original comment and continuing theme have been to at least look at the possibilities. I never said that such a solution had to exist, just that it should at least be considered and tested realistically because the GPU does have serious strengths over the CPU (especially if you split out the FPU).

There is a historical parallel to this thread. For many, many decades, all of the general AI experts swore up and down that they could find a Holy Grail in the form of a relatively small set of rules (think of them as the Maxwell's equations or Grand Unification Theory of AI) that, when implemented, would allow them to take over the world. We're now in 2013 and the best hope for general AI today is IBM's Watson, and if its hardware and software prowess isn't the antithesis of the "we can out-think this problem into submission", nothing ever will be. Some problems are just very difficult and outstrip our ability to comprehend them, much less solve them at all, let alone in the most efficient manner possible. Sometimes you just have to break some eggs to make omelettes, and you won't know the answers to some questions without asking them. The Pi digit problem may not be amenable to a GPU solution on the Pi board compared with the ARM CPU and FPU (that people keep conveniently forgetting about), and while I may not have time to determine it, there are people who can, and I'm going to enlist their help. More about that in the next post replying to Bakul's x prize Challenge (lowercase "x prize" fully intended since he's so cheap he's only offering a few Pi boards - I'm not sure what $100 in cash is any more - can you convert that to Sky Miles or some similar kind of representation?).

I just saw heater's latest post and I agree with him 100% - nice thinking so far outside the box that you need the Hubble Space Telescope to look back and see the box! I also thank you for your willingness to at least consider some thought experiments that support my suggestions, and that's all they've ever been, suggestions (frustrated and strongly-worded, perhaps ... who, me? ;) ).
The best things in life aren't things ... but, a Pi comes pretty darned close! :D
"Education is not the filling of a pail, but the lighting of a fire." -- W.B. Yeats
In theory, theory & practice are the same - in practice, they aren't!!!

Bakul Shah
Posts: 320
Joined: Sun Sep 25, 2011 1:25 am

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 9:52 am

jwlawler wrote:Bakul,

A nice summary but I am slightly puzzled by one point.
Bakul Shah wrote: We have already established the title of this discussion thread is false. Probably a good thing in hindsight :-)
Do you mean that the Pi is a good platform for calculating pi? I would say that the Pi is an acceptable platform for the job if you have a decent algorithm. However, if you have something else that is faster and also easy to use (e.g. a typical laptop) then you may as well use that.
We do have a couple of more than decent pi computing algorithms. Given those we can compute million digits in a few seconds. In my eyes that makes the "(Ras)Pi good at calculating pi". Not the best, not even great but good. Sure, a modern laptop will be an order of magnitude or more faster but that is true for pretty everything that can be done on a Raspi and a laptop. This is already known (or should be by anyone contemplating using a Raspi).

Let me say this another way. Compare a modern Laptop to a Mercedes and a Raspi to a VW bug. In this analogy your title would be akin to "A VW bug is not good at - taking you from Los Angeles to San Francisco". When what you really mean is that your expectations are that of a Mercedes and a Bug is just not as good as a Mercedes. But when you say "good" what you really mean is "fast", ignoring all other aspects! Not only that you are refusing to soup up your Bug using all the innovations done in the past 40 years! With those modern innovations you may reach in 15 minutes instead of hours but that is still much slower than what you can do with a Merc. FInally you don't even need to go from Los Angeles to San Francisco! That is just a test you use to evaluate a new car!

In general, if you want "ease of use" and comfort, stay with the Mercedes of a laptop but if you want an adventure, use the lower powered Raspi "VW bug"! Having to overcome all the challenges posed by it will make you more creative and feel more alive :-) But if you expect all the creature comforts of a Mercedes from a Bug, you are bound to be very unhappy.

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 10:02 am

Jim Manley wrote:
jwlawler wrote:Do you mean that the Pi is a good platform for calculating pi? I would say that the Pi is an acceptable platform for the job if you have a decent algorithm. However, if you have something else that is faster and also easy to use (e.g. a typical laptop) then you may as well use that. I wouldn't bother with super clever programming, e.g. attempting to use the GPU, since, in my case at least, the same effort applied to the maths would give better results.
How do you know that if an alternative hasn't even be specified, much less analyzed or tested? We've already established that about 30% of the computation using the best algorithm can be accomplished most efficiently in FP hardware, and roughly half of the pipelines in the GPU provide that, with the other half providing a non-trivial amount of integer-handling power. No one has even looked at how it might be possible to arrange the problem where GPU hardware can be ganged to provide wide-word processing (12 x 53 bits of FP mantissa and 12 x 32 bits of integer, although jamesh's numbers may modify those somewhat).

Unless you mean upgrading your code to reflect the world record algorithm, how can you compare the effort needed to improve the math with what a GPU design and implementation would require when you don't even know what the effort for either will be? We haven't even started talking about the benefits the GPU may provide if the aforementioned disassembler and related assembler (IIRC) for the VideoCore IV can provide any improvements over the existing APIs provided by Broadcom. My original comment and continuing theme have been to at least look at the possibilities. I never said that such a solution had to exist, just that it should at least be considered and tested realistically because the GPU does have serious strengths over the CPU (especially if you split out the FPU).

There is a historical parallel to this thread. For many, many decades, all of the general AI experts swore up and down that they could find a Holy Grail in the form of a relatively small set of rules (think of them as the Maxwell's equations or Grand Unification Theory of AI) that, when implemented, would allow them to take over the world. We're now in 2013 and the best hope for general AI today is IBM's Watson, and if its hardware and software prowess isn't the antithesis of the "we can out-think this problem into submission", nothing ever will be. Some problems are just very difficult and outstrip our ability to comprehend them, much less solve them at all, let alone in the most efficient manner possible. Sometimes you just have to break some eggs to make omelettes, and you won't know the answers to some questions without asking them. The Pi digit problem may not be amenable to a GPU solution on the Pi board compared with the ARM CPU and FPU (that people keep conveniently forgetting about), and while I may not have time to determine it, there are people who can, and I'm going to enlist their help. More about that in the next post replying to Bakul's x prize Challenge (lowercase "x prize" fully intended since he's so cheap he's only offering a few Pi boards - I'm not sure what $100 in cash is any more - can you convert that to Sky Miles or some similar kind of representation?).

I just saw heater's latest post and I agree with him 100% - nice thinking so far outside the box that you need the Hubble Space Telescope to look back and see the box! I also thank you for your willingness to at least consider some thought experiments that support my suggestions, and that's all they've ever been, suggestions (frustrated and strongly-worded, perhaps ... who, me? ;) ).
Sorry, I thought that my statements were rather mild and uncontroversial.

I said: " the Pi is an acceptable platform for the job". I don't think that I need to know the available alternatives to say that. We have calculated pi on the CPU of the Pi. I think that by itself proves the statement.

On the maths versus clever programming comment, I did qualify it with "in my case", I am not claiming it as a universal truth. I meant, that if I wanted to make my current program faster, I would achieve more by improving the maths than by learning how to program on the GPU. This is me judging my own skills, I don't think that it can be reasonably contested.

Go back to my comparison of the two simple algorithms that we have mentioned in this note. The famous, simple arctan(1) algorithm and my only slightly more complicated arcsin(0.5) one. The performance improvement is massive. If you stayed with the simple algorithm, a 30% performance boost won't let you catch up, a million fold improvement won't let you catch up. I could build a Babbage Analytical Engine (admittedly a very big one) and run my algorithm on that before you got whatever hardware you liked to run the simpler one.

I have also admitted then when the algorithm is fixed, or no better one is known, then clever programming and full exploitation of the available hardware is appropriate.

I am interested in the possibility of exploiting the power of the GPU for general purpose calculation but I don't have the hobby time to look at it seriously unless someone shows the way.

I am also disappointed at the progress of AI. I saw the movie 2001 as a child and, for a long time, expected that we would get computers as good as HAL.

Bakul Shah
Posts: 320
Joined: Sun Sep 25, 2011 1:25 am

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 10:08 am

Jim Manley wrote:The Pi digit problem may not be amenable to a GPU solution on the Pi board compared with the ARM CPU and FPU (that people keep conveniently forgetting about), and while I may not have time to determine it, there are people who can, and I'm going to enlist their help. More about that in the next post replying to Bakul's x prize Challenge (lowercase "x prize" fully intended since he's so cheap he's only offering a few Pi boards - I'm not sure what $100 in cash is any more - can you convert that to Sky Miles or some similar kind of representation?).
Jim, of course I am cheap! I am hoping my fellow doubters would also chip in. But the real prize is being considered a wizard by your peers. Sort of like winning the IOCCC (the international Obfuscated C contest) or the ICFP programming contest! And fame and papers in international journals.

All this jousting in good fun. Please don't be frustrated!

User avatar
Jim Manley
Posts: 1600
Joined: Thu Feb 23, 2012 8:41 pm
Location: SillyCon Valley, California, and Powell, Wyoming, USA, plus The Universe
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 10:21 am

Heater wrote:Jim, For an ex-military guy who is used to the rough and tumble of the barracks you are a very touchy, sensitive, if not paranoid guy. That is to say there is no "hate" here only a discussion about possibilities.
You may not be aware of some of the vile that some here have perpetuated in other threads - there are posters who exploit every opportunity to take potshots at the Foundation and anyone who supports their focus on STEM education, particularly if others are in on it, too. Their basic problem is that they want the Pi to do everything for everyone all of the time and cost even less, and anyone who disagrees with that is a target for ridicule. As Liz says, some people demand the Pi come with a sandwich ... and wheels.

I don't really expect the life-long civilians and non-combatant military people to understand what life is like beyond the wire, but I do expect some respect. My Dad spent two years practicing for Operation Overlord in the South of England and then put it to use on June 4, 1944, dropping in near Ste. Mere Eglise with 30,000 of his closest friends in the 101st Airborne Division, then proceeded through all of the hell-holes between Normandy, Remagen, Arnheim, Eindhoven, Bastogne, and the Battle of the Bulge. That's where he was shot through both legs with a 13 mm Panzer machine gun, taken POW, and lost 60 pounds in three months, thanks to Montgomery taking Patton's tanks' gas to "tidy up the front" in the undisputed Southeast of France at that point. So, Patton's forces couldn't relieve the 101st, which had jumped behind enemy lines (what they're supposed to do), and the tanks ran out of gas a few miles short, where they were slaughtered.

He's still alive at 91 and remembers every second of the war, but can't tell you what he just had for breakfast. I know this first-hand as his primary caregiver on top of an 80-plus-hour work week, volunteering at the Computer History Museum and Monterey Bay Aquarium, teaching STEM to kids after normal classes, organizing two Jams a month 100 miles apart, volunteering entire weekends for Maker Faires, etc. If you've ever been actually shot at, particularly over an extended period of time, you'd be paranoid too. Former Intel CEO Andy Grove's autobiography is titled, "Only the Paranoid Survive" (he's a cancer survivor, which features large in his views). Plus, I'm sure you're familiar with that old saying, "Just because you're paranoid doesn't mean they're not out to get you." ;)
Heater wrote:When you say, "I still stand by my statement that most people aren't even thinking about how the Pi's GPU can be used to advantage for tasks beyond graphics," I suspect you are correct. I also suspect that there are good reasons for that, and not simply that people are to 'stoopid' to consider it:

a) The Pis GPU is a closed, undocumented, chunk of hardware that is probably unique to the Broadcom range of ARM Socs. As such it does not encourage anyone to tackle using it as a super parallel maths engine. The results of their efforts would not be generally useful.
There's are very good reasons why it's closed, as is every other SoC used in similar products to those in which the BCM2835 is used. Go ahead, ask Apple, Samsung, Qualcomm, Motorola, etc., for their toolchains and docs and see what kind of response you get ... I'll wait [sound of whistling and humming for an indeterminate amount of time]. See also Andy Grove's book mentioned above.
Heater wrote:b) It has been suggested that advantage can be taken of the GPU even without knowing about its internals. That is by using the OpenGL ES API. This may well be true, but then in my quick Googling around of have not seen anyone anywhere doing that for any platform with OpenGL. ES or otherwise. I would suspect that if it made sense it would have been attempted already.
I strongly suspect not very many people with interests beyond graphics know anything about OGL(ES) and especially the GL Shader Language (GLSL) that would be used to try to actually make this happen. I could attempt it, but I have a couple of things on my plate already, as you may have noticed above.
Heater wrote:c) As we have seen argued it is not clear that the benefits of all those parallel multipliers and accumulators actually outweigh the overheads of getting data in and out of the thing.
This may be the tallest pole in the tent, but it may be possible to put the entire problem in the GPU, whereupon you only need to dump the results from the buffer either when you've completed the task to the desired number of digits, or you've run out of GPU memory. In the latter case, this could be done in multiple passes, with results accumulated in non-volatile storage.
Heater wrote:d) It's in the nature of many problems that breaking them down into parallel operations is hard. If you have ever tried to recast your serial Fast Fourier Transform algorithm into something that can be run in parallel chunks by OpenMP, for example, then you know what a chore it is.
That's one of the bevy of tools used in the aforementioned Richard W. Hamming Memorial High-Performance Computing Center where I've spent many a fond month (thinking it was a day each time, of course ;) ). As one of my co-conspirators in crime likes to say, "FORTRAN - 60-plus years of computing tradition, unfettered by progress." The tools for doing parallel computing development are abysmal by any measure primarily because there's no money in it as the customer base is so small. We wouldn't even have OpenMP, Rocks, or any of the other tools we do have if it weren't for the government paying salaries of research associates (one of my previous jobs) to create them. Microsloth pretty much dried up what interesting new development tools and technologies that were in the commercial pipeline by their illegal corporate shenanigans perpetuated continually against perceived competitors. Go ahead and trudge up and down Sand Hill Road in Palo Alto and see how many non-responses you get for business plans seeking venture capital for creating powerful parallel computing software development tools ... I'll wait [more whistling and humming].[/quote]
Heater wrote:Bottom line, Document that GPU. Let the hard core geeks at it. Perhaps they will come up with surprises like they did back in the early days of 8 and 16 bit machines when demos were churned out that were "clearly impossible".
I have a dream ... and a plan. I'll discuss it in my response to Bakul's x prize Challenge.
The best things in life aren't things ... but, a Pi comes pretty darned close! :D
"Education is not the filling of a pail, but the lighting of a fire." -- W.B. Yeats
In theory, theory & practice are the same - in practice, they aren't!!!

User avatar
Jim Manley
Posts: 1600
Joined: Thu Feb 23, 2012 8:41 pm
Location: SillyCon Valley, California, and Powell, Wyoming, USA, plus The Universe
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 10:50 am

Bakul Shah wrote:Jim, of course I am cheap! I am hoping my fellow doubters would also chip in. But the real prize is being considered a wizard by your peers. Sort of like winning the IOCCC (the international Obfuscated C contest) or the ICFP programming contest! And fame and papers in international journals.
All this jousting in good fun. Please don't be frustrated!
Yeah, the problem with the Internet is that no one knows I really am a dog ... a viscious, rabid, attack ... Chihuahua! Military humor is pretty dark for obvious reasons, and my acerbic tone most likely is making it difficult for anyone here to sense that I'm laughing all the way to the data bank (see, a "joke"! :lol: - I have to :lol: at my own jokes because no one else even recognizes them as such). I'm thoroughly enjoying the jousting - in fact, Hal Heydt and I were discussing jousting, chain mail, and suits of armor with Eben, Liz, Hexxeh, Pimoroni, and a cast of thousands after the Faire Saturday and into dinner. Hal happened to have with him a sample of the stainless steel chain mail he had made as part of his full-torso shirt, which Eben couldn't resist draping over various limbs and admiring his Manley demeanor. Oh, wait, that was my Manley demeanor (emphasis on the "mean" :)).

BTW, I worked with a winner of the IOCCC - she's a fellow(ette?) pilot - birds of a feather fly together! My problem with the IOCCC is that I produce obfuscated code all the time - I just don't win because it's so obfuscated not even I know what it's supposedly doing!

Now, about that x prize. I plan to tell all of the 8th grade STEM students in the US about this challenge and I'm going to tell them that not only is it impossible to do, but they are forbidden from working on it. If you want to see some really motivated kids, tell them something is impossible and that they're not allowed to do it. Even better, get the government to declare it illegal. I figure it will be about a week before my mailbox is overflowing with solutions, some good, some not so much, and probably a few absolute stunners. Computers are now so easy to use that any 8th grader can operate them, so that's who I ask when I can't get something to work. One 13 year-old kid taught himself C, C++, Objective C, and Java and develops iOS and Android apps.
The best things in life aren't things ... but, a Pi comes pretty darned close! :D
"Education is not the filling of a pail, but the lighting of a fire." -- W.B. Yeats
In theory, theory & practice are the same - in practice, they aren't!!!

User avatar
DaveDriesen
Posts: 113
Joined: Sun Mar 31, 2013 8:28 pm
Location: Top of the food chain
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 11:00 am

One 13 year-old kid taught himself C, C++, Objective C, and Java and develops iOS and Android apps
I hear you.. What a noob. We were doing ASM at that age and reading machinecode, writing trainers and cracking copyrights.

And as they say, back then, there WAS no old skool.

Dave Driesen
Linux dev and oldskool elite

Bakul Shah
Posts: 320
Joined: Sun Sep 25, 2011 1:25 am

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 11:14 am

Heater wrote:It has been suggested that advantage can be taken of the GPU even without knowing about its internals. That is by using the OpenGL ES API. This may well be true, but then in my quick Googling around of have not seen anyone anywhere doing that for any platform with OpenGL. ES or otherwise. I would suspect that if it made sense it would have been attempted already.
See for example http://www2.compute.dtu.dk/pubdb/views/ ... mm5771.zip
With openCL available for desktop GPUs, these are harder to find these days.

OtherCrashOverride
Posts: 582
Joined: Sat Feb 02, 2013 3:25 am

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 11:38 am

We haven't even started talking about the benefits the GPU may provide if the aforementioned disassembler and related assembler (IIRC) for the VideoCore IV can provide any improvements over the existing APIs provided by Broadcom
The toolset is here: https://github.com/hermanhermitage/videocoreiv

Heater
Posts: 13360
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Tue May 21, 2013 12:32 pm

Jim,
...nice thinking so far outside the box that you need the Hubble Space Telescope to look back and see the box!
You have a brash, overbearing, offensive, rude, elitist, politically incorrect, arrogant ...(should I go on?)...style.

However, that statement above has to be the nicest things anyone has said about me for many a year. Thank you.

Not to worry, we quite like to read what you have to say, much in the way we like the abrasive styles of Gordon_Ramsay or Jeremy Clarkson. As you know the British sense of humour can be quite dark as well.
Memory in C++ is a leaky abstraction .

Return to “General discussion”