Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Sun Oct 20, 2019 6:52 pm

ejolson,

I have a burning question that has been bugging me for years. I'm not sure why but it never occurred to me to ask you. It goes like this:

Some years ago there was a little "PI Day" challenge going on here to calculate the most digits of PI on a Raspberry Pi. Sorry I can't find the link to that now.

That got me curious and hence finding the history of the record holders for calculating the digits of PI. Nicely summarized by wikipedia here: https://en.wikipedia.org/wiki/Chronolog ... _of_%CF%80
Currently standing at 31,415,926,535,897 digits. I think at the time the record was only 10,000,000,000,050 digits.

Now what struck me about all this was the fact that all the recent record holders were not any of the worlds super computers. None of them were those hugely expensive, and well huge, machines with thousands/millions of processors and GPUs etc. Machines that nation states brag about.

Oh no, the machines used for PI digit record breaking were not much more than a few off the shelf PC's with masses of RAM/disk attached. Often build on a budget of almost nothing.

Today the current record holder, despite having all the resources of Google on tap is not using anywhere near the massive parallelism a super computer or Googles data center posses.

Naively a layman would assume that a super computer that cost hundreds of millions and consumes the electrical power of a city could get the PI digit record breaking job done during it's lunch break. They do not.

So the question:

Why not?

Is it really impossible to parallelize the PI calculation? Or have they just not bothered to try?

I would have thought that after spending millions of dollars on a new super computer the first thing you would do is break the digits of PI record to show off how fast it is.
Memory in C++ is a leaky abstraction .

User avatar
Paeryn
Posts: 3008
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Liberation through Computer Literacy

Sun Oct 20, 2019 9:03 pm

I would think that the people using these powerful supercomputers have more pressing jobs to run on them than calculating pi. It's not like us buying a new RPi where it may be sat idling for hours / days at a time.
She who travels light — forgot something.

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Sun Oct 20, 2019 11:08 pm

For sure people don't want to waste time on useless things on their shiny new and expensive super computers.

But they do brag about how fast they are at running industry standard bench marks. Which is very important, national pride and all that.

So why not beating the world record for the digits of Pi as a benchmark? Sounds like a natural to me.

Is it because they would be embarrassed by how slow that would make their big expensive machines look compared to amateur lash ups of a few PC boxes?
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Liberation through Computer Literacy

Sun Oct 20, 2019 11:12 pm

I would like to see challenges that result in some useful code. Absurd is getting old.

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Sun Oct 20, 2019 11:54 pm

John_Spikowski,
I would like to see challenges that result in some useful code. Absurd is getting old.
Well... there is one little problem I have been meaning to toy with for a long time. Which does have real world applicability. It's the problem of rapidly finding all the objects in a two dimensional space that are some distance away from a given point. Think geolocation for example.

Let me frame it like this:

1) Space will be represented by a square grid of 4194304 points in two dimensions. That's 2048 points in each direction. 0 to 2047 in x, 0 to 2047 in y.

2) One 16th of the grid points will be occupied by objects. That's 262144 objects. The objects positions are distributed randomly among the available grid points. No more than one in each position. The objects have no size, they are "point like".

3) A "target" point is selected at random.

The problem is to find all the objects within some distance of the target (Distance TBD). As quickly as possible.

Sound interesting? Up for the challenge?

Usual rules would apply: One language only used. No use of libraries that are not a standard part of the language spec. Open source only. Must run on a Raspberry Pi...
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 12:20 am

This may have value for the RPi Tank project I'm working on.

Will objects have properties?

ejolson
Posts: 5588
Joined: Tue Mar 18, 2014 11:47 am

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 2:20 am

Heater wrote:
Sun Oct 20, 2019 11:54 pm
John_Spikowski,
I would like to see challenges that result in some useful code. Absurd is getting old.
Well... there is one little problem I have been meaning to toy with for a long time. Which does have real world applicability. It's the problem of rapidly finding all the objects in a two dimensional space that are some distance away from a given point. Think geolocation for example.

Let me frame it like this:

1) Space will be represented by a square grid of 4194304 points in two dimensions. That's 2048 points in each direction. 0 to 2047 in x, 0 to 2047 in y.

2) One 16th of the grid points will be occupied by objects. That's 262144 objects. The objects positions are distributed randomly among the available grid points. No more than one in each position. The objects have no size, they are "point like".

3) A "target" point is selected at random.

The problem is to find all the objects within some distance of the target (Distance TBD). As quickly as possible.

Sound interesting? Up for the challenge?

Usual rules would apply: One language only used. No use of libraries that are not a standard part of the language spec. Open source only. Must run on a Raspberry Pi...
A couple years ago I worked with a student to develop a fast algorithm to do essentially that, except in 3D and instead of the closest point out of N points to one location there were also N locations. This was used to find differences in point-cloud data collected using 3D scans. The slow algorithm is O(n^2) whereas we managed O(n log^2 n).

Our motivating application was tracking where cars were located in a parking lot. Other applications include LIDAR guidance systems and detecting architectural changes from topographic scans after an earthquake.

Although the algorithm was published in the student's thesis, I've been meaning to write a proper research report. The algorithm itself does not divide space into grids and so the asymptotic runtime does not depend on the dimensionality of the data. As a result, unlike many other methods which are efficient only in two dimensions, handling six-dimensional data that includes a colour triple as well as position does not take much longer to process. Unfortunately, I'm not sure it's a good problem for a thread under the topic "General programming chat and advice for beginners."

On the other hand, the suggested problem is practical, especially compared to computing too many digits in the decimal approximation of pi. I wonder what programming languages are liberating in the sense of allowing a person the freedom to efficiently code any algorithm he or she can think up for finding the Hausdorff distance between two finite sets of points.

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 6:50 am

@John_Spikowski,
Will objects have properties?
For the purposes of the challenge the objects only have position (x, y). With the range specified.

Of course that does not prevent one from building those coordinates into whatever object model you have of the otherway around extending that base property.

@ejolson

Great, you already have a solution in mind. Let's stick to 2D for us simple minded folk.

Actually there are a couple of changes I'd like to make to the problem spec.

1) Previously we have had issues actually timing things. Like the "warm up time" of interpreter engines and cache filling. Like the time taken to print the results. To that end we should specify it so that the actual algorithm time is measured. After all in a real, long running, applications it's the algorithms run time that matters not all that other junk. We should arrange the run time is longer, like requiring to find the distances of some number of different targets, such that run time is many seconds or a minute say.

2) I'd like to specify that always producing a complete list of all possible points within the range is not required.

To that second point, imagine you are somewhere in a big city and you want a list of restaurants within 2km of your location. That might be a big list. You might be happy with just being told of 5 of them that are nearby. Not even the 5 nearest ones necessarily, for example 5 that are all off in the same direction down a particular street would do for you to check out. But not outside the range.

I'm not sure how to express that idea properly yet or how we would evaluate and compare the results of different approaches, which would likely be different.
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 10:14 am

Memory in C++ is a leaky abstraction.


A good analogy is like blowing bubbles and expecting the them to last.

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 10:19 am

Ha!

I guess you have heard the common phrase in the C++ world that it provides "zero cost abstractions". That is to say it tries to provide higher level features in such a way that one could not do the same more efficiently in a lower level like C or assembler.

Then of course there is the notion of memory leaks.

I could not resist putting the two together :)
Memory in C++ is a leaky abstraction .

User avatar
John_Spikowski
Posts: 1614
Joined: Wed Apr 03, 2019 5:53 pm
Location: Anacortes, WA USA
Contact: Website Twitter

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 10:32 am

C++ is written C.

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 12:33 pm

What do you mean C++ is written in C?
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5588
Joined: Tue Mar 18, 2014 11:47 am

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 2:56 pm

Heater wrote:
Mon Oct 21, 2019 12:33 pm
What do you mean C++ is written in C?
These days the Linux kernel is written in C and C is written in the C++ programming language. Maybe someday C++ will be written in Python or maybe Basic.

The reason the previous challenges measured the wall time elapsed from the command used to start the program to when the program exits is because that's the time the user experiences. On the other hand, the Fibonacci scaling analysis was performed by a driver that interactively ran the program and timed each computation separately.

If you are suggesting the test data consist of randomly distributed points, then using a separate driver to create the data and perform the timing would factor out differences in the speed and distribution of the random number generators that are available in each programming language.

I suspect the driver used for the final Fibonacci tests could easily be modified. When time permits I'll create a simple O(n^2) code that can be driven by such a timing harness as an example.
Last edited by ejolson on Mon Oct 21, 2019 3:42 pm, edited 1 time in total.

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 3:41 pm

That's right. Previously we were interested in the user experience of running a command as a one off so all the overheads of cranking up interpreters, warming up jits and caches, formatting the output was part of the test.

For a change I thought we could do the more normal thing and focus on the algorithm run time. As if it were part of some endlessly running server process or some such.

I suspect when I say "random" I mean a totally non-random, well specified, PRNG with known fixed seeds. It's nice to have reproducibility. Say a PCG variant or one of the xoshiro / xoroshiro generators. I guess a file of required input data would do, sounds clunky.

Or we can really random and get into some long running statistical testing.

Either way, the time taken to generate or read the input should not be included.

The algorithm I have in mind, you knew I had something in mind already right?, might be speedy but the price for that is that it might not be totally complete/accurate all the time. One of those modern day "good enough nobody will notice the difference" kind of algorithms popular among NoSql database kids now a days.

The question then is how to compare such fuzzy results?
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5588
Joined: Tue Mar 18, 2014 11:47 am

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 3:53 pm

Heater wrote:
Mon Oct 21, 2019 3:41 pm
The question then is how to compare such fuzzy results?
How about
  • F for a program that produces wrong answers
  • D for one which fails to compile
  • C for one that segfaults
  • B for one that is slow
  • A for one which is liberating
Here liberating means the programming language allowed the programmer to express their algorithm to the processor in a way that permits the computer to execute that algorithm efficiently. It does not mean the algorithm itself is advanced or clever.

Perhaps a grade of C++ is suitable for using an overly-clever algorithm.

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 4:26 pm

What happened to E ?

- E for Experimental.

The algorithm I ant to experiment with is really short and sweet.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 5588
Joined: Tue Mar 18, 2014 11:47 am

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 4:39 pm

Heater wrote:
Mon Oct 21, 2019 4:26 pm
What happened to E ?

- E for Experimental.

The algorithm I ant to experiment with is really short and sweet.
I was reserving grade E for the programs I write, but experimental works just as well.

Presently I'm using spaghetti to measure the stochastic parameterisation of turbulence models. I'll think about how to algorithmically generate some interesting data sets for the next programming challenge soon.

User avatar
davidcoton
Posts: 5140
Joined: Mon Sep 01, 2014 2:37 pm
Location: Cambridge, UK
Contact: Website

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 4:45 pm

ejolson wrote:
Mon Oct 21, 2019 4:39 pm
Presently I'm using spaghetti to measure the stochastic parameterisation of turbulence models
What is the accuracy of a piece of spaghetti as a measuring tool?
Location: 345th cell on the right of the 210th row of L2 cache

ejolson
Posts: 5588
Joined: Tue Mar 18, 2014 11:47 am

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 4:51 pm

davidcoton wrote:
Mon Oct 21, 2019 4:45 pm
ejolson wrote:
Mon Oct 21, 2019 4:39 pm
Presently I'm using spaghetti to measure the stochastic parameterisation of turbulence models
What is the accuracy of a piece of spaghetti as a measuring tool?
Not very good, though with many pieces the accuracy improves as does ones knowledge about the inaccuracy. I much prefer raspberry pie because spaghetti frequently gets caught in my throat. More information is here.

https://en.m.wikipedia.org/wiki/Spaghetti_plot

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 5:13 pm

Strangely enough the algorithm I have in mind to tackle this problem involves spaghetti.
Memory in C++ is a leaky abstraction .

User avatar
bensimmo
Posts: 4719
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 7:17 pm

Heater wrote:
Sun Oct 20, 2019 11:08 pm
For sure people don't want to waste time on useless things on their shiny new and expensive super computers.

But they do brag about how fast they are at running industry standard bench marks. Which is very important, national pride and all that.

So why not beating the world record for the digits of Pi as a benchmark? Sounds like a natural to me.

Is it because they would be embarrassed by how slow that would make their big expensive machines look compared to amateur lash ups of a few PC boxes?
Once upon a time they would break how fast they could crunch through seti@honme units, ClimatePrediction, Rosetta@Home or Find-A-Drug or LHC and then BOINC messed it all up and benchmarking didn't work so well, graphics card became better than CPU's for a lot of them and we'll ...
Times move on. Doing useful stuff while you benchmarked didn't look as pretty.

User avatar
bensimmo
Posts: 4719
Joined: Sun Dec 28, 2014 3:02 pm
Location: East Yorkshire

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 7:33 pm

Heater wrote:
Mon Oct 21, 2019 6:50 am
@John_Spikowski,
Will objects have properties?
For the purposes of the challenge the objects only have position (x, y). With the range specified.

Of course that does not prevent one from building those coordinates into whatever object model you have of the otherway around extending that base property.

@ejolson

Great, you already have a solution in mind. Let's stick to 2D for us simple minded folk.

Actually there are a couple of changes I'd like to make to the problem spec.

1) Previously we have had issues actually timing things. Like the "warm up time" of interpreter engines and cache filling. Like the time taken to print the results. To that end we should specify it so that the actual algorithm time is measured. After all in a real, long running, applications it's the algorithms run time that matters not all that other junk. We should arrange the run time is longer, like requiring to find the distances of some number of different targets, such that run time is many seconds or a minute say.

2) I'd like to specify that always producing a complete list of all possible points within the range is not required.

To that second point, imagine you are somewhere in a big city and you want a list of restaurants within 2km of your location. That might be a big list. You might be happy with just being told of 5 of them that are nearby. Not even the 5 nearest ones necessarily, for example 5 that are all off in the same direction down a particular street would do for you to check out. But not outside the range.

I'm not sure how to express that idea properly yet or how we would evaluate and compare the results of different approaches, which would likely be different.
I think you are saying something like we do for out air quality project (PM2.5 sensors).

We have result in an SQL (or similar) database, they all have their GNSS location.
I have my Pi with GNSS (or phone, or Alexa) and ask it to give me the current reading.
I can grab the nearest by actual distance from my current point.
I can also grab the nearest 5 and take an average, or remove any that haven't sent data back for a while.

I first did this in Python3, cheap and easy as we don't have that many points to loop through at the moment but moved it to the server in some language (php/SQL?).

Scale it up to 1000s of point and the Python3 would be slow probably.
To be honest I would bother doing it in my own code and I would look at SciPi (and maybe KDtree ?) and leave it to people who are good at this :-)
https://docs.scipy.org/doc/scipy/reference/

User avatar
davidcoton
Posts: 5140
Joined: Mon Sep 01, 2014 2:37 pm
Location: Cambridge, UK
Contact: Website

Re: Liberation through Computer Literacy

Mon Oct 21, 2019 10:42 pm

ejolson wrote:
Mon Oct 21, 2019 4:51 pm
More information is here.

https://en.m.wikipedia.org/wiki/Spaghetti_plot
Thanks for the link. Enjoy your raspberry Pi(e).

Heater wrote: Strangely enough the algorithm I have in mind to tackle this problem involves spaghetti.
The Italian Job?
Location: 345th cell on the right of the 210th row of L2 cache

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Tue Oct 22, 2019 12:32 am

davidcoton,
The Italian Job?
This recipe was inspired by an Italian but this particular spaghetti strand was cooked by a German.
Memory in C++ is a leaky abstraction .

Heater
Posts: 16301
Joined: Tue Jul 17, 2012 3:02 pm

Re: Liberation through Computer Literacy

Tue Oct 22, 2019 12:50 am

bensimmo,
I think you are saying something like we do for out air quality project (PM2.5 sensors).

We have result in an SQL (or similar) database, they all have their GNSS location.
I have my Pi with GNSS (or phone, or Alexa) and ask it to give me the current reading.
I can grab the nearest by actual distance from my current point.
I can also grab the nearest 5 and take an average, or remove any that haven't sent data back for a while.

I first did this in Python3, cheap and easy as we don't have that many points to loop through at the moment but moved it to the server in some language (php/SQL?).

Scale it up to 1000s of point and the Python3 would be slow probably.
Yes, the challenge is kind of inspired by that geographical problem.

Rather than loop through thousands of points stored in a database why not let the database do the work:

Code: Select all

SELECT * FROM Places WHERE
    (Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
AND
    acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) <= 0.1570;
See nice article about this here: "Finding Points Within a Distance of a Latitude/Longitude Using Bounding Coordinates" - http://janmatuschek.de/LatitudeLongitud ... SQLQueries

How fast it works will depend on your database of course. But I suspect it would be hugely faster than any home made looping through thousands of points in Python.

That sounds likes an interesting project you have there.
Memory in C++ is a leaky abstraction .

Return to “General programming discussion”