jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Thu May 16, 2013 4:57 pm

jojopi wrote:By the way, 32bit ARM and 64bit AMD each have an integer multiply instruction that takes two full values and produces a double precision result, spread across two registers. You do not get that in FP.
Some chips also have a division of a double length value spread across two registers. Those instructions were very tempting in this program and I did consider using them. Of course, it would be at the expense of portability. They could double the speed of my program. Sticking to standard C types, I have to store my long number in an array of elements whose length is half the longest available int. This is so I can multiple two without overflow. With the instruction that you mention, I could multiply two maximum length integers with no overflow. I would also need that long divide instruction.

Heater
Posts: 13344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Thu May 16, 2013 7:09 pm

Using the GPU sounds like a great idea. If you really need to go fast on this particular platform and it's even practically possible to do. From what I gather you first have to build or create the tool chain to even start to think about it.

Meanwhile, Raspian is based on open source and easily portable code. 99 percent of it is not suitable to run on the GPU. If it were we would ditch the ARM processor and bask in the glorious speed of the GPU, right?

The code presented here may or may not be the best way to calculate pi. And it may not be the best way to calculate pi on the Pi. But it has at least survived over many years and many architectures. As such it might be a good little benchmark of various architectures.

Certainly better than dhrystone that was very much string based.

User avatar
Jim Manley
Posts: 1600
Joined: Thu Feb 23, 2012 8:41 pm
Location: SillyCon Valley, California, and Powell, Wyoming, USA, plus The Universe
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 2:18 am

First of all, those who don't know my background should be aware that I'm one of the few people on the planet who routinely operates a 30 decimal place (not bit) Babbage Difference Engine, which is a monument to integer math to which none of the olde-schoole hardware that folks have been referring to here can hold a candle for nostalgia value. There is no feeling more amazing than cranking the beast and becoming part of the calculator (one becomes both the clock based on cranking speed, as well as the power supply!).

The Difference Engines aren't computers as they don't contain any logic that can change the execution path depending on the outcome of a computation. Babbage's Analytical Engine, if it had been built, would have been able to do that and thus would have been the first actual automatic computer. It would have handled 50 decimal digits, 100 digits in double-precision mode, and the AE would also have had FP capability as Babbage discovered that integers were a losing proposition for problems that were literally astronomical in size (the DEs were originally developed to automatically fabricate printing plates for astronomical and navigation tables).

So, I understand exactly where the OP is coming from and I share in the delight of fooling around with musty old stuff - algorithms, hardware, software, or otherwise. My first fun on computers was playing board games on a GE 260 series mainframe via Model 28 Teletype yellow-pulpy-paper-spewing terminals, some of which were located in the basement of two of eight wings of our dorm building. I had 32KB of available-anytime interactive drum storage of my very own, could get another 32KB just for the asking, but needed academic adviser approval for more. Even so, my high school classmates at other universities were positively green with envy as they had to submit punched cards for overnight batch processing - interactive processing was still a dream in most places back then.

Anyway, that's not why we came here. Anyone who thinks that FP calculations produce unpredictable rounding errors not only doesn't understand FP hardware, but they don't understand FP math as well. The way that you perform cascaded, extended FP calculations to any arbitrary number of bits/digits is to simply account for and avoid the very well-understood rounding error associated with each specific calculation type (multiplication, division, transcendental functions, Bessel functions, and a bunch of other stuff people who aren't engineers have never heard of). In other words, you only use the bits that don't get rounded one way or another in each stage of the calculation.

Heater "got it" when he noted that you not only have one 112-bit mantissa to play with, there are multiples of them in the GPU (at least four, maybe sixteen or even more - I'd need to look up what's in the architecture diagrams that are publicly available). Jojopi missed it in saying:
jojopi wrote:By the way, 32bit ARM and 64bit AMD each have an integer multiply instruction that takes two full values and produces a double precision result, spread across two registers. You do not get that in FP.
Oh yes, you certainly do - it's called double-precision and not only is it specified in the IEEE standards (there have been three since 1965), but so is quadruple-precision, and standards-compliant hardware such as a floating-point unit (FPU) provides sufficient 80-bit registers (mantissa and exponent) to hold calculation inputs and outputs through quadruple-precision.
jojopi wrote:Are you saying that we should not calculate pi on a Pi, and we should just make pretty graphics instead?
Here's where actually knowing what you're talking about comes in very handy. A GPU is not limited to "just make pretty graphics" - in fact, a GPU has no idea that it's manipulating mathematical models that may just happen to represent 3-D virtual objects. A GPU is simply a very special-purpose hardware device optimized for performing LOTS of FP matrix calculations in the shortest possible amount of time. You're confusing the HDMI and composite video generators that are tacked onto the back end of the Pi's GPU with the GPU itself. GPUs don't generate video, digital/analog video interfaces do, which almost universally today just happen to have a GPU as a major component. The video generators on the BCM2835 are part of the SoC, not the GPU.
jojopi wrote:Because, regardless of theoretical FLOPS, calculating pi on the VC4 would be extremely challenging with only an OpenGL ES API. If you are brave enough to attempt a GPU-based pi calculation without licensing proprietary tools, then I will provide an ARM-based comparison.
Extremely challenging? Once again, it helps to know what Open GL (Original Flavor or ES) compliant hardware actually does, which is independent of the actual implementation, BTW. So, no knowledge of the underlying technology nor use of proprietary tools are needed. The GPU spends most of its time and abilities performing matrix manipulations (granted, this is not common knowledge to those not at least passingly familiar with how 3-D graphics are actually computed, but that's not a problem for me). One of the fundamental 3-D graphics matrix computations performed is scaling, which is simply multiplying a bunch of FP numbers (in parallel, as it turns out, as each dimension is completely independent of the others for this function). You simply provide the vertices (points in 3-D space represented as arrays of three FP numbers for each vertex) and FP scaling factors for each dimension (x, y, and z), call the Open GL (ES) function for performing the scaling, and you obtain the resulting new 3-D space values for all of the vertices.

There is nothing to prevent one from stuffing any old arbitrary FP numbers upon which they want to perform FP multiplies into the vertex and scaling factor data structures and, voila! Instant parallel FP multiplication automagically completed for you. This is a perfect example of folks not understanding how to think outside the box (or, more accurately, board, I suppose) with the Pi - it's a whole lot more than just a spindly little 98-pound weakling ARM CPU that's really just a traffic cop between the Ethernet port and USB bus, and the SoC.
jojopi wrote:Am I really "stoopid"?
I'm going to take the high road and let everyone come to their own conclusions ;) I won't hold it against you if you weren't aware of how the GPU actually does its thing and how the graphics APIs can be used for purposes they were never intended, but that's an aspect of computing education about which I'm very passionate. I should point out that the largest (by far) Association of Computing Machinery (ACM - the international professional society for computing) special interest group (SIG) is SIGGRAPH, the Special Interest Group for Graphics, and the corresponding IEEE standards group is proportionally large. There are only eight cities in the U.S. with large enough facilities to host the annual SIGGRAPH conference at the end of July - and about the same number of facilities around the rest of the world. It's a very popular SIG ... mostly because it's just so much fun!

I'm in discussions with various Foundation/Broadcom folks about some really screwball ideas for how we can take much better advantage of the Pi's strengths, starting with implementing a browser natively in OpenVG (the 2-D graphics API supported by the GPU), and even a 3-D web browser natively in Open GL ES (instant 3-D WebGL and HTML5, anyone?). If that works out, the next logical steps would be to implement 2-D and 3-D GUIs so that we can just take X and shove it into the dustbin of history where it belongs. There are multiple Masters thesis and PhD dissertation opportunities just in these ideas, and many more when considering peripheral components that could be developed as part of this new paradigm, as any OVG/OGL(ES) compliant platform could take advantage of this work. Once I've got these off the ground, I'll provide the FP Pi calculation code using OGLES, but I've provided enough information that anyone with more uncommitted time than I have available should be able to easily accomplish it. I can provide more pointers on how to git 'er done if any such person should find themselves stuck.
The best things in life aren't things ... but, a Pi comes pretty darned close! :D
"Education is not the filling of a pail, but the lighting of a fire." -- W.B. Yeats
In theory, theory & practice are the same - in practice, they aren't!!!

W. H. Heydt
Posts: 10889
Joined: Fri Mar 09, 2012 7:36 pm
Location: Vallejo, CA (US)

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 3:54 am

Jim Manley wrote: It reminds me of the joke about who was more intelligent, a mathematician or an engineer. They were placed at one end in a room, with a ravishing naked, um, "technician" in a reclined position on a couch at the other end of the room, and a clock on the wall over the "tech". They were instructed that they could advance half the distance to the "tech" as each minute passed on the clock. When the first minute expired, the engineer advanced halfway across the room, but the mathematician remained in his original place against the wall. Another minute went by and the engineer advanced to three-quarters of the way across the room, but the mathematician remained fixed in place. The organizers stopped the contest to ask the mathematician if he understood the rules, to which he smugly replied, "Well, anyone with a wit's worth of intelligence knows that you could advance toward the 'tech' for an infinite amount of time and still never actually get there." The organizers said to the engineer, "Well, it seems the mathematician has bested you, old boy.", to which the engineer quickly shot back, "Oh, yeah? Well give me ten more minutes and I'll be close enough for engineering approximations!"
:
In a non-identical, but similar joke I first heard over 40 years ago, the punch line was "...close enough for all practical purposes."

Bakul Shah
Posts: 320
Joined: Sun Sep 25, 2011 1:25 am

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 6:51 am

Takuya Ooura's pi_css5 program on the Raspi computes the first million digits of Pi in under 79 seconds (compared to under 3s on a 3.6Ghz AMD FX). So I'd say the Raspi is plenty fast!

And Jim, AFAIK programs computing first N digits of π don't benefit by using a GPU!

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 7:03 am

Jim,

Thanks a lot for that.

I know enough maths, it was my university subject, to have an appreciation of what the GPU is doing but I have never had the time or energy to look deeply into it or to consider how to exploit it for other purposes. Clearly, it is a powerful resource that could be used for many other fun things including number crunching that is nothing to do with graphics.

My prejudice about FP is excessive and dates from long ago. My first experience with computers predated IEEE standardisation so you would have to study a particular architecture carefully to know exactly how it would behave. Also, you often had ints available with more bits than the mantissa of the FP registers. So, there was not a lot of point in considering using FP for exact calculations.

Another factor in my prejudice is that I often see programmers using FP for business calculations and being surprised when adding a hundred pennies does not give them a pound.

In this particular case, porting this program to run on the GPU of the Pi would be amusing but is never going to get to the top of my priority list. Running this old program is just a bit of fun and nostalgia and a personal calculation benchmark. If, for some reason, I had a need to calculate Pi to a large number of decimal places, I would probably get better results by doing more maths than more programming. Although the algorithm that I deduced in school was way better than the simple famous one, there are others that are way better than mine. There are even simple tweaks that would make mine a lot better. Switching to one of them and staying on the main CPU would beat porting my current program to the GPU. There is one very good algorithm that I would like to play with. My current program would be a good basis but it would need a little work first. It can add and subtract two high precision numbers and it can multiply and divide a high precision number by a regular int. As it stands, it cannot multiple or divide one high precision number by another. This is not because I couldn't but only because I didn't need to for the current algorithm.

I also have an interest in old computers. The Babbage display in the London Science Museum is a favourite as is Bletchley Park where Turing worked on code during the war.

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 7:09 am

Bakul Shah wrote:Takuya Ooura's pi_css5 program on the Raspi computes the first million digits of Pi in under 79 seconds (compared to under 3s on a 3.6Ghz AMD FX). So I'd say the Raspi is plenty fast!

And Jim, AFAIK programs computing first N digits of π don't benefit by using a GPU!
Interesting, I'll have to have a look. It is presumably using a way better algorithm than mine (not at all surprising). It fits nicely with my post of a few minutes ago. In this casgamee, doing more maths will give better benefits than clever programming. It is interesting that the difference between the Pi and the AMD is comparable to the speed difference that I found.

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 7:10 am

jwlawler wrote:In this casgamee
A weird mistype there. I decided to change "case" to "game" and somehow ended up with a mix of the two.

OtherCrashOverride
Posts: 582
Joined: Sat Feb 02, 2013 3:25 am

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 7:46 am

The PI uses OpenGL ES 2.0 with few extensions. The impact of this is that there is no way to get data out of the vertex shader stage directly since transform feedback would be required (part of GL ES 3). The PI also lacks floating point texture support so that means input to the vertex shader is limited to sequential vertex data (no random access from floating point textures) and output from the pixel shader stage is limited to 4 values each with 8 bits. Additionally, since GL ES 2 does not support Pixel Buffer Objects (PBO) there is no clearly defined way to get the computational results back to the CPU. In short, GPGPU on the PI is a one way trip and of limited practical use.

The largest payoff for GPU hardware acceleration in a HTML browser is going to be for HTML 5 video that is encoded in a GPU supported format. WebGL on the PI would be of little benefit as there is little content produced for that standard and the content that does exist assumes PC class GPU hardware (large texture memory and geometry requirements) in addition to OpenGL extensions that are not present on the PI. It makes little sense to invest the time, money and effort into as OpenVG/OpenGL is not going to make JavaScript run any faster.

Heater
Posts: 13344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 10:23 am

OtherCrashOverride,
WebGL on the PI would be of little benefit as there is little content produced for that standard
Wait a minute. WebGL is a quite recent introduction to the WEB technology stack which has only shown up in browsers over the last year or so and is still not in IE. So yes there is little content using it.

But I love it and would love to see it on the Pi.

Even if you are not using 3D it can be enormous help in speeding up 2D animations, which is what I use it for in some production apps. Other options like canvas and svg are terrible slow.

Check out this library for an example of a very simple to use and fast 2D API. http://lib.ivank.net/?p=demos

Heater
Posts: 13344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 10:29 am

Jim,

Whilst a lot of what you say is true your posts do carry a very high and mighty or condescending tone.

For many geeks code speaks louder than words so if you can post some code that demonstrates to us mere mortals how to do multi-million digit arithmetic on the GPU we will start to be more in tune with your ideas. If you can restrict it to standard OpenGL ES or OpenCL so that it is portable that would be a bonus.

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 10:50 am

'Armhf' pops up as a package classification in relation to Raspbian to denote that it takes advantage of / is compatible with the Pi's hardfloat capability. What is this and where is it, and how can one access it via C(++) compiled using GCC? Does GCC automatically optomise for this on the Pi or do you some how have to request that calculations use the hardfloat capability?

Anyone?

I don't mind being condescended to - just be suscinct, accuruate, thorough and precicise.

Ravenous
Posts: 1956
Joined: Fri Feb 24, 2012 1:01 pm
Location: UK

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 11:00 am

Heater wrote:if you can post some code that demonstrates to us mere mortals how to do multi-million digit arithmetic on the GPU ...
Jim knows his stuff (as well as most of ours probably), he just suffers from a "deplorable excess of personality" :)

But seriously, I haven't been following (or understanding) the "coding on the GPU" threads, (for a start I gathered it was impossible without the toolkit,) but if anyone has posted any basic working code that demonstrates even simple maths I'd be interested to see it...

OtherCrashOverride
Posts: 582
Joined: Sat Feb 02, 2013 3:25 am

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 11:14 am

Over-simplification follows:
"armhf" aka "hard float" refers to the ABI (application binary interface) that defines hows arguments are passed between functions. This standard defined by ARM (the cpu designer) allows programs and libraries from different vendors/compilers to inter-operate in a well defined manner.

"hard float" means that floating point arguments to functions are to be passed to other functions using the hardware floating point registers where possible. "soft float" means that floating point arguments will be passed to other functions using the integer registers of the processor where possible. The combination of integer and floating point registers in "hard float" means that more arguments can be passed while avoiding the slower use of the program stack and that conversion to and from integer is not required.

The use of "hard float" mandates there is a floating point co-processor available; however, the use of "soft float" does not imply a floating point co-processor is absent. "hard float" only works with a co-processor. "soft float" works in the presence or absence of a co-processor.

The choice of ABI can be passed as a parameter to GCC during compile. The default used is dependent on how the copy of GCC used is configured. For the rasbian distribution, the default of "hard float" is used unless otherwise specified.

The use of a floating point co-processor can also be passed as parameter to GCC. The default used is similarly dependent on how the copy of GCC used is configured. For the raspian distribution, the default of VFP2 is used unless otherwise specified.

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 11:32 am

Many thanks OtherCrashOverride - in summary (of your 'over-simplified' summary) - its automatic.

Re: programming on the GPU, Ravenous said "I gathered it was impossible without the toolkit" - I say UNLEASH THE BEAST!!!!!

Am I right in thinking that that is dependant on the Broadcom / the Foundation releasing currently secret information from their UFO hanger in Area 51, or can we just take the Manley approach of re-purposing OpenGL ES 2.0 using currently available libraries / apis?

Maybe I should I stop typing and start reading: http://www.raspberrypi.org/phpBB3/viewforum.php?f=68

plugwash
Forum Moderator
Forum Moderator
Posts: 3455
Joined: Wed Dec 28, 2011 11:45 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 11:49 am

Heater wrote: Rather, make use of the fact that a quad sized floating point number has 112 bits of mantissa, as Jim said.
Of course we don't actually have quad precision floating point, the highest we have is double precision.

pygmy_giant
Posts: 1562
Joined: Sun Mar 04, 2012 12:49 am

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 11:54 am

Image

"To be precise..."

Heater
Posts: 13344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 1:32 pm

pluwash,

Double precision float gives us 53 bits of mantissa which can be in integer calculations performed by the FPU. Which is what JavaScript does. That munches more bits per instruction for us. I have no idea how the timing works out though.

User avatar
Jim Manley
Posts: 1600
Joined: Thu Feb 23, 2012 8:41 pm
Location: SillyCon Valley, California, and Powell, Wyoming, USA, plus The Universe
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 2:14 pm

Bakul Shah wrote:Takuya Ooura's pi_css5 program on the Raspi computes the first million digits of Pi in under 79 seconds (compared to under 3s on a 3.6Ghz AMD FX). So I'd say the Raspi is plenty fast!
Bakul - thank you for coming to the aid of a fellow engineer. If anyone who can read actually goes and looks at the honorable Ooura-san's source (http://left404.com/misc/files/pi_css5/pi_css5_src.tgz), they will see references to FFT, which a few folks here should recognize as the Fast Fourier Transform. In particular, there is a curious note about "Number of Floating Point Operations", and further reading will show that log base 2 and square root functions are called. If I'm not mistaken, those can be performed on FP hardware a LOT faster than any integer hardware of the same class (e.g., a class would be a typical microprocessor, an SoC, a massively-parallel supercomputer, etc.). There are other notes about "DBL_ERROR_MARGIN" and "FFT+machine+compiler's tolerance", which correspond precisely to the avoidance of rounding errors by only using unrounded bits as I posted about earlier. I am listening for the church keys opening some nice, fresh cans of crow to be consumed in large quantities.
Bakul Shah wrote:And Jim, AFAIK programs computing first N digits of π don't benefit by using a GPU!
I'll show you when I see you at the Maker Faire this weekend, assuming you're going again. Maybe between you, Eben, a few other geeks, and I scribbling on napkins at lunch, we can produce the implementation in much less time than it would take for any of us to do it alone. I don't expect the uninformed here to be of much help.
OtherCrashOverride wrote:The PI uses OpenGL ES 2.0 with few extensions. The impact of this is that there is no way to get data out of the vertex shader stage directly since transform feedback would be required (part of GL ES 3). The PI also lacks floating point texture support so that means input to the vertex shader is limited to sequential vertex data (no random access from floating point textures) and output from the pixel shader stage is limited to 4 values each with 8 bits. Additionally, since GL ES 2 does not support Pixel Buffer Objects (PBO) there is no clearly defined way to get the computational results back to the CPU. In short, GPGPU on the PI is a one way trip and of limited practical use.
Well, I disagree completely and if we can establish the confab noted in the previous paragraph, I believe we will have proof that this is quite feasible. I haven't looked closely at the OGLES 3 spec yet, but is there anything in there that the Pi GPU won't be able to support? I can't imagine that being the case since the A series SoCs in Apple's iOS products and Exynos/Tegra/whatever SoCs in Android devices have essentially the same functional properties as the VideoCore IV architecture used in the Pi's GPU.

As for extensions, well, they're not exactly standard now, are they? However, that does suggest another approach if there really is no access to computational results - it should be relatively simple for someone with access to the Broadcom toolset to break out intermediate results, assuming the pipeline hardware isn't so tightly-wound that the successive stages can only feed each other from start to finish. We only need a few functions such as scaling to be able to do some quite useful computational things.
OtherCrashOverride wrote:The largest payoff for GPU hardware acceleration in a HTML browser is going to be for HTML 5 video that is encoded in a GPU supported format. WebGL on the PI would be of little benefit as there is little content produced for that standard and the content that does exist assumes PC class GPU hardware (large texture memory and geometry requirements) in addition to OpenGL extensions that are not present on the PI. It makes little sense to invest the time, money and effort into as OpenVG/OpenGL is not going to make JavaScript run any faster.
Again, I heartily disagree - WebGL is a chicken-and-egg situation precisely because so few people really understand what *GL* (in all its various forms, hence the stars) is about. I cut my teeth on the Original Flavor GL on SGI hardware when it was first using a 16 ~ 33MHz MC68020 CPU, so I understand what you're trying to say. However, there's a prime example of a relatively weak CPU acting as a traffic cop for a kick-butt (for 1986) GPU in the form of the original Geometry Pipeline architecture - 30,000 Gouraud-shaded polygons per second, no waiting! ;)

As for JavaScript performance, not all interpreters are created equal and many are quite awful, but there are some very reasonable ones available now and IIRC, Google has been supporting an open-source effort to provide continuous improvement in that area. With any software, you have to be intelligent about what you're doing and performing expensive, not-well-thought-out things inside loops that are executed many times is probably the single worst offense committed by people more prone to cobbling than actual software engineering.

The Pi isn't alone when it comes to a lack of a WebGL-compatible browser - Safari on iOS and even Google's own Chrome browser on Android also suffer this fate, but it's not because it can't be done, the resources just haven't been applied. Even Microsloth hasn't bothered implementing WebGL in Internet Exploiter but, then again, they're so far behind in useful browser features that this is not exactly news. If a lack of content were really a barrier, then commercial radio and TV should never have gotten off the ground, should they? Why should Gutenberg have invented the movable type printing press since monks had produced plenty of religious texts by hand for millennia up until that point? I mean, really? However, I shouldn't be surprised since it took Edison about 20 years to wean people off of gas lamps, despite all of the demonstrable advantages of electric lights - oh, like the elimination of an open flame in heavily-draped rooms and around excessively-draped people!

As for content that has "large texture memory and geometry requirements", we can easily find web content that will bring any browser to its knees on any typical consumer computing platform because of all of the dynamic trash that's thoughtlessly attempted to be crammed through the Internet. There are plenty of WebGL-based interactive animations with quite reasonable 3-D resource needs that would be wonderful to be able to show and interact with in educational environments, and if we can do it on a Pi, then we will have arrived at that intersection between Heaven, Nirvana, Shangri-La, and other forms of Paradise. Schools block lots of content exactly because they require more bandwidth than their networks and even many of their computers can handle (ever heard of YouBoob, I mean YouTube? It's not available in many K-12 schools, so we teachers have to capture it and bring it in on removable media).

As for the personality thing, I come from a military background (and aviation, at that), and we like to horse around in competitive jousts to relieve the stress, so I apologize to the more sensitive ladies in the room if I have offended them (oh, darn, they're going to send me right back into the touchy-feely sessions, aren't they? :lol:). I really get my dander up when people always assume that you have to have all of the bells and whistles on everything all of the time or it's not worthwhile pursuing something. That's why the economy wound up in such a big pinch - gluttony begot poverty for many because they really didn't deserve a five-bedroom house filled with six-foot diagonal HDTVs and a three-car garage stuffed with suburban assault vehicles.

The entire philosophy of the Pi is all about making reasonable trade-offs between the best available this week and what was quite acceptable just a couple of years ago that now costs a pittance. That's why I get so animated and vociferous about the poppycock being bandied about not just in this thread, but in many others. The nerdocracy needs to back off and comprehend what the Pi was really designed for, stop trying to make the poor little ARM CPU do things it's just not capable of, and consider how the GPU can be put to better use, as it clearly does have advantages that haven't been exploited much at all, much less fully.

Great things have never happened because a committee of like-minded intellectuals sat around a big table and pontificated upon the lint in their navels. They happened because some lunatics burst into the room, set the drapes on fire while excitedly waving sparklers around, and knocked over the punch bowl. This quite accidentally put out said drapes ala flambé, which coincidentally allowed sunbeams to enter the room through the newly-bared windows and cast the problems under consideration in literally an entirely new light. I don't even consider myself one of those lunatics - I've worked with some brilliant ones though, and if you think I'm over-the-top, you really don't want to leave your comfortable surroundings and venture out into their world. It's filled with all sorts of scary things like glaring electric lights, supposedly cancer-inducing radio waves, intertubes filled with all manner of strange visual and audible over-stimulation, etc.

Don't take things so personally. I was never attacking the OP's efforts from a nostalgic perspective, but he cited performance numbers and that's a sure-fire way to attract unwanted attention. I'm just trying to point out to his responders that the Pi's ARM CPU "... is not the droid you're looking for." Here's some words of wisdom from Edna St. Vincent Millay: “My candle burns at both ends; it will not last the night; but ah, my foes, and oh, my friends -- it casts a lovely light.” Or, you can take Shakespeare at his words: "There are more things in Heaven and Earth, Horatio, than are dreamt of in your philosophy." Then again, Dan Aykroyd kinda summed up the joys of witty, intentionally-provocative banter on an early "Saturday Night Live" episode with: "Jane, you ignorant slut ... " :lol:
Last edited by Jim Manley on Fri May 17, 2013 2:39 pm, edited 2 times in total.
The best things in life aren't things ... but, a Pi comes pretty darned close! :D
"Education is not the filling of a pail, but the lighting of a fire." -- W.B. Yeats
In theory, theory & practice are the same - in practice, they aren't!!!

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 2:20 pm

Heater wrote:pluwash,

Double precision float gives us 53 bits of mantissa which can be in integer calculations performed by the FPU. Which is what JavaScript does. That munches more bits per instruction for us. I have no idea how the timing works out though.
My program's performance depends on two important types. One I call the Work type in I perform calculations. Experience suggests that it performs best when this is the largest that is supported in hardware. So, my 64 bit laptop is fastest when I set this type to 64 bits but the Pi is fastest when I set it to 32. It can also be set to 16 but that is historical. The second important type is called Elem and is the type of the large arrays holding the values. The Work type must be at least twice the Elem type so that I don't get overflows on the calculations. In the decimal version of the program (there is a not yet described binary version), I store 2 digits per element when it is 8 bits, 4 when 16 bits, and 8 when 32 bits. So, the programs loops half as often for each step up the scale and hence goes about twice as fast. This is why I could probably double the speed if I accessed the double register instructions that we mentioned earlier.

Back to floating point, if I exploit a double as 53 bit integer then I would have 26 bits available for my Elem type and I could store 7 digits per element. (8 nearly but does not quite fit.) The number of loops per high precision calculation would be almost as small as if I used a 64 bit integer. So, provided that the FP arithmetic was a similar speed to integer, I would almost double program speed (more precisely, I may improve it by 7/4). Unlike, some of the other suggestions in the thread, this should be just a matter of playing with the header files and recompiling so I may give it a go.

simplesi
Posts: 2327
Joined: Fri Feb 24, 2012 6:19 pm
Location: Euxton, Lancashire, UK
Contact: Website

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 2:29 pm

This thread needs to be required reading on any new Computing syllabus that gets introduced :)

Great stuff :)

Simon
Last edited by simplesi on Fri May 17, 2013 6:27 pm, edited 1 time in total.
Seeking help with Scratch and I/O stuff for Primary age children
http://cymplecy.wordpress.com/ @cymplecy on twitter

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 2:39 pm

Jim Manley wrote:<snip>
Don't take things so personally. I was never attacking the OP's efforts from a nostalgic perspective, but he cited performance numbers and that's a sure-fire way to attract unwanted attention. I'm just trying to point out to his responders that the Pi's ARM CPU "... is not the droid you're looking for.
<snip>
No offence taken here but I offer my apologies for kicking off a fire storm. My post was never intended as a serious evaluation of the Pi or a serious way to calculate pi but just a description of a game that I play with this ancient piece of code of mine. I had hoped that this was obvious but if it wasn't let me now say that was my intention. I hoped it may prompt some interesting and useful discussion and it seems that it has. As I mentioned, the original program was written in Fortran. I said "nearly 40 years ago", more precisely, I think that it was 1974 or 75. I suspect that many of you were not even born when this program was written.

Of the ideas proposed so far: using the GPU sounds very interesting but I don't expect that I will ever get the time to try it. I will be publishing my code soon and if someone else gets it to run on the GPU, I would be very interested in their results; using a floating point type within the limits of its mantissa should be very easy and hence I may try that.

User avatar
jojopi
Posts: 3085
Joined: Tue Oct 11, 2011 8:38 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 3:55 pm

jwlawler wrote:Back to floating point, if I exploit a double as 53 bit integer then I would have 26 bits available for my Elem type and I could store 7 digits per element. (8 nearly but does not quite fit.) The number of loops per high precision calculation would be almost as small as if I used a 64 bit integer. So, provided that the FP arithmetic was a similar speed to integer, I would almost double program speed (more precisely, I may improve it by 7/4).
You can store 9.6 digits in a 32 bit int, so that should be the fastest.

gcc will emit the extending multiply when you assign an int to a long long and then multiply by another int. Perhaps the problem is with splitting the result back into halves? A right shift by 32 can be optimized away, but a divide by 10^9 can not.

All the usual arbitrary precision libraries on the Pi are using 32 bit int as the limb (elem) type. Including pi_css5. It uses double coefficients for the FFTs that accelerate its multiply and sqrt algorithms, but all the main AGM variables are stored as integers.

Similarly, if the laptop is running a 64 bit OS you should be able to use __int128 as your work type there, for a factor of four improvement in long multiplication.

jwlawler
Posts: 83
Joined: Sun May 12, 2013 12:15 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 6:10 pm

jojopi wrote:
jwlawler wrote:Back to floating point, if I exploit a double as 53 bit integer then I would have 26 bits available for my Elem type and I could store 7 digits per element. (8 nearly but does not quite fit.) The number of loops per high precision calculation would be almost as small as if I used a 64 bit integer. So, provided that the FP arithmetic was a similar speed to integer, I would almost double program speed (more precisely, I may improve it by 7/4).
You can store 9.6 digits in a 32 bit int, so that should be the fastest.

gcc will emit the extending multiply when you assign an int to a long long and then multiply by another int. Perhaps the problem is with splitting the result back into halves? A right shift by 32 can be optimized away, but a divide by 10^9 can not.

All the usual arbitrary precision libraries on the Pi are using 32 bit int as the limb (elem) type. Including pi_css5. It uses double coefficients for the FFTs that accelerate its multiply and sqrt algorithms, but all the main AGM variables are stored as integers.

Similarly, if the laptop is running a 64 bit OS you should be able to use __int128 as your work type there, for a factor of four improvement in long multiplication.
Yes, I am being slightly wasteful by putting only 8 digits in the 32 bit elem types. The very old 8 bit elem could only hold 2, the newer 16 bit could hold 4, I rather lazily just doubled again to 8 when I went to 32 bits but it could have been 9 as you say.

I have not experimented with __int128 but I may give it a try. As well as multiplying two 64 bit values and getting a 128 but result, I think that I need to divide that 128 bit result by a 64 bit value and back in to a 64 bit value. I know that this sort of thing is possible at the assembler level in some chips but I don't know how to exploit it in C.

If it can be done then, as with the FP experiments, it should be just a matter of playing with some typedefs in one header file. The rest of the code should adjust to the new definitions. If the program calculates pi to 1,000,000 places correctly using FP types, it will reduce my prejudice against FP. However if I get more usable bits from an integer type then I would still bet on that.

I mentioned in passing a binary version. This calculates pi in binary and stores the obvious 32 bits in a 32 bit elem etc. This reduces the number of elements a bit and speeds some calculations (some multiplies and divides are by the base and hence can be bit shifts in the binary version). Of course, the final result is not as human would like to see it so I follow with a binary to decimal conversion. This is fairly significant for 1,000,00 places but it is a linear algorithm unlike the main one which is quadratic and hence the binary calculation followed by conversion will win above a certain size which is rather less than 1,000,000 on the platforms that I have tried.

The main reason that I wrote the binary version was to increase my confidence in the answer. Today, I can easily get someone else's answer from the web. When this program was first written, that was not an option. There were several possible errors:

1. My maths could be wrong. I could verify something like 50 places against independent results. I thought that it was highly unlikely that my algorithm would be wrong but get the first 50 places right.

2. My high precision arithmetic algorithms could be wrong in principle. Again, I was reasonably confident that if it got the first 50 places right then this was not the case.

3. I may have missed an overflow at some point or miscalculated the effect of rounding errors. This was the most serious problem since it could easily occur only beyond the verifiable places.

I hoped that the binary version would address case 3. It was the same maths and essentially the same implementation so it was equally likely to have an error but since all the intermediate calculations would be quite different it was very unlikely that it would suffer from exactly the same errors.

Heater
Posts: 13344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Something the Pi is not good at - calculating Pi

Fri May 17, 2013 6:52 pm

Jim,

This sounds promising, sounds like you have the ear of people in the right places. Applying your charm and social graces to those ears might get us some fast GPU maths possibilities.

However I'd be ecstatic with getting webgl working.

As you said Google have done wonders with their V8 JavaScript interpreter. I recently wrote a version of one of our server processes in JS that was originally written in C++. Turns out to do the job using the same CPU load as the old version. With the bonus that it was much easier to write and will be a lot easier to extend and maintain.

Return to “General discussion”