## Having Fun with threads! Very Interesting

pizzaboy150
Posts: 18
Joined: Sat Sep 05, 2015 11:03 am

### Having Fun with threads! Very Interesting

So since the Pi2 and the extra cores I thought is maybe worth looking in parellel programming to see how cool it is. I created a simple program to calulate primes and stress the cpu so I could monitor core usage in htop while running.

Here is the program I created!

Code: Select all

``````#include <stdio.h>

typedef struct
{
int start;
int end;
int step;
}data;

int isPrime(long int number)
{
long int i;
for(i=2; i<number; i++)
{
if(number % i == 0)
{
//not a prime
return 0;
}
}
return number;
}

void calcPrimes(int start, int stop, int step)
{
long int s;
for(s=start; s<=stop; s+=step)
{
if (isPrime(s) > 0)
{
//Its a prime number!!!
}
}
}

{
calcPrimes(3, 100000, 8); //stepping 8 numbers for 4 cores
return NULL;
}

{
calcPrimes(5, 100000, 8); //starting thread 2 at the next odd number jumping 8 spaces for 4 cores
return NULL;
}

{
calcPrimes(7, 100000, 8); // starting thread 2 at the next odd number and jumping 8 spaces for a 4 core run
return NULL;
}

{
calcPrimes(9, 100000, 8); // think you get it.
return NULL;
}

int main()
{
printf("Calculate Prime Numbers\n");
printf("==================================================\n\n");

//wait for threads to join before exiting

return 0;
}
``````

And my results were not quite what I was expecting... sort of!

3 thread = 16.637 s <= WTF!!!

I noticed that running on 3 threads the activity seemed only to be divided by 2 cores.... thought I would see 3 active cores and one ticking over?

But its great to see a massive performance increase by having the extra cores to play with.

I thought this was interesting so I shared it.

ejolson
Posts: 5605
Joined: Tue Mar 18, 2014 11:47 am

### Re: Having Fun with threads! Very Interesting

When I compile with optimization

gcc -O3 -o prime prime.c -lpthread

the run time is 0.000 seconds. Since the program doesn't do anything when a prime is found, the compiler removes the unneeded loops. I made a slight modification to count and report the number of primes each thread finds so the calculation can't be removed through optimization.

Code: Select all

``````#include <stdio.h>

typedef struct
{
int start;
int end;
int step;
}data;

int isPrime(long int number)
{
long int i;
for(i=2; i<number; i++)
{
if(number % i == 0)
{
//not a prime
return 0;
}
}
return number;
}

int calcPrimes(int start, int stop, int step)
{
int c=0;
long int s;
for(s=start; s<=stop; s+=step)
{
if (isPrime(s) > 0)
{
c++;
}
}
return c;
}

{
int c;
c=calcPrimes(3, 100000, 8); //stepping 8 numbers for 4 cores
return NULL;
}

{
int c;
c=calcPrimes(5, 100000, 8); //starting thread 2 at the next odd
//number jumping 8 spaces for 4 cores
return NULL;
}

{
int c;
c=calcPrimes(7, 100000, 8); //starting thread 2 at the next odd
//number and jumping 8 spaces
return NULL;
}

{
int c;
c=calcPrimes(9, 100000, 8); // think you get it.
return NULL;
}

int main()
{
printf("Calculate Prime Numbers\n");
printf("==================================================\n\n");

//wait for threads to join before exiting

return 0;
}
``````
Run time on a Pi2B is as follows:

Code: Select all

``````\$ gcc -O3 -o prime2 prime2.c -lpthread
\$ time ./prime2
Calculate Prime Numbers
==================================================

real    0m6.689s
user    0m26.550s
sys 0m0.000s
``````
I find it it interesting that each thread found about the same number of primes.

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

JFYI:
about the distribution of prime numbers in N (the Natural Numbers):
the distribution function normally is written as π(x) and
π(x)
is nearly equally-distributed over N just like
x/log(x)
which is almost linear (~800:6000 or later on ~9000:100000) with some irregular prime gaps though.

So you're expected to get almost the same number of primes in each subset of N by steps of 8 (a subset of constant modulus offsets which is what you are doing in your test).

see: Gauss, Legendre, Riemann, Euler, Hadamard, de Poisson, Erdös, Selberg
(prime numbers are the "atoms" of the Natural Numbers, and prime numbers and (pseudo) random numbers are sort of my favourits! )

if you're interested in a quick prime number generator, then have a look at the Sieve of Erastothenes or the Sieve of Atkin ! #define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

I tried to compile your example for my B+, but I get errors (I admit I have different compiler and build settings for Geany, but I added your options though:):

prime2.c: In function ‘int main()’:
prime2.c:80:55: error: invalid conversion from ‘void* (*)()’ to ‘void* (*)(void*)’ [-fpermissive]
^
Kompilierung fehlgeschlagen.
In file included from prime2.c:2:0:
^
prime2.c:82:55: error: invalid conversion from ‘void* (*)()’ to ‘void* (*)(void*)’ [-fpermissive]
^
In file included from prime2.c:2:0:
^
prime2.c:84:55: error: invalid conversion from ‘void* (*)()’ to ‘void* (*)(void*)’ [-fpermissive]
^
In file included from prime2.c:2:0:
^
prime2.c:86:55: error: invalid conversion from ‘void* (*)()’ to ‘void* (*)(void*)’ [-fpermissive]
^
In file included from prime2.c:2:0:
^
#define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

AndyD
Posts: 2334
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia
Contact: Website

### Re: Having Fun with threads! Very Interesting

davenull wrote:I tried to compile your example for my B+, but I get errors (I admit I have different compiler and build settings for Geany, but I added your options though:):

g++ -Wall -I/opt/vc/include -I/opt/vc/include/interface/vmcs_host/linux -I/opt/vc/include/interface/vcos/pthreads -O3 -c "prime2.c" -lshapes -lshapes_plus -lpthread -lrt -lwiringPi...
You are trying to compile C code with the C++ compiler.

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

yes, I always need to have C++ because I also need C++ libs like
#include <iostream>
#include <limits>
using namespace std;

what will I have to change then (and where) ?
#define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

AndyD
Posts: 2334
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia
Contact: Website

### Re: Having Fun with threads! Very Interesting

davenull wrote:yes, I always need to have C++ because I also need C++ libs like
#include <iostream>
#include <limits>
using namespace std;

what will I have to change then (and where) ?
The error messages you posted are telling you that the thread*Go() functions do not match the function signature expected by pthread_create. The C++ compiler is a bit more strict with function signatures. Change

And it will compile

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

thank you!

now it compiles and build is fine too!

runtime for my B+ at 800MHz : 51489 ms ---------------------------------------------------

BTW, what is the -O3 parameter for?
#define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

ghans
Posts: 7882
Joined: Mon Dec 12, 2011 8:30 pm
Location: Germany

### Re: Having Fun with threads! Very Interesting

• Don't like the board ? Missing features ? Change to the prosilver theme ! You can find it in your settings.
• Don't like to search the forum BEFORE posting 'cos it's useless ? Try googling : yoursearchtermshere site:raspberrypi.org

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

thank you!

now i tried -O1, it's no effect (51513ms) on this source as I had expected then for int algebra.

again, learned something new, thanks again! #define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

isn't it useful or even necessary to acquire mutexes for calling
int isPrime(long int number)

#define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

AndyD
Posts: 2334
Joined: Sat Jan 21, 2012 8:13 am
Location: Melbourne, Australia
Contact: Website

### Re: Having Fun with threads! Very Interesting

davenull wrote:isn't it useful or even necessary to acquire mutexes for calling
int isPrime(long int number)

You would probably get some benefit from reading some online tutorials on pthreads. This look like a good introduction.

A mutex is used to protect code/variables that may not work correctly if two (or more) threads are accessing the code at the same time. In the prime example in these posts, there is no variables shared between the thread. All local variable in the functions run by the thread exist on the stack and each thread has its own stack. So, even though they are executing the same code, each thread has its own copy of the local variable in the functions being used.
Last edited by AndyD on Wed Nov 04, 2015 8:35 pm, edited 1 time in total.

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

indeed, I'm very new to Linux and pthread and actually I really needed a good tutorial
viewtopic.php?f=33&t=125086

Good to know that all which a task calls is at it's own stack, even shared common functions. (I also had made them "inline" so far as I was not sure about a general thread safety.)

But anyway, now it's more clear to me, again thanks a lot!
#define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

allfox
Posts: 452
Joined: Sat Jun 22, 2013 1:36 pm
Location: Guang Dong, China

### Re: Having Fun with threads! Very Interesting

Hello.

This thread reminders me something I didn't figure out when I was in college: how to multi-threading prime finders.

I guess I found a way now.

I know it's a C++ forum, excuse me for posting a JavaScript program here:

Code: Select all

``````"use strict";

let n = 100000;
let primeList = ;
let numberOfCPUs = require("os").cpus().length;

let swarm = require("cluster");
let startTime = Date.now();

// Firstly, get a prime list up to sqrt(n). So that we got all primes that is needed to check if n is a prime.
let swarmStartWorkingFromHere = Math.floor(Math.sqrt(n));
for(let i = 3, max = swarmStartWorkingFromHere; i <= max; ++i) {
if(isPrime(i)) {
primeList.push(i);
}
}

// Fork to swarm
if(swarm.isMaster) {
let endCounter = 0; // Counting for how many workers finished his job. When all numberOfCPUs workers done their job, we could exit the program.
for(let i = 0; i < numberOfCPUs; ++i) {
swarm.fork().on("message", function found(prime) {
if(prime === "end") {
endCounter++;
if(endCounter === numberOfCPUs) {
console.log("Done in " + (Date.now() - startTime) / 1000 + " seconds, there are " + primeList.length + " primes in list.");
process.exit(0);
}
} else {
primeList.push(prime);
}
});
}
} else {
// Assuming that Node gives workers IDs like 1, 2, 3, 4. Document didn't say this, so maybe broken.
console.log("Worker " + swarm.worker.id + " is online.");
for(let i = swarmStartWorkingFromHere + swarm.worker.id; i <=n; i += numberOfCPUs) {
if(isPrime(i)) {
process.send(i);
}
}
process.send("end");
}

function isPrime(x) {
let checkBorder = Math.floor(Math.sqrt(x));
for(let i = 0; primeList[i] <= checkBorder; ++i) {
if(x % primeList[i] === 0) {
return false;
}
}
return true;
}
``````
When Node.js 5.0.0 on Pi 2, it done in about 1.9 seconds.
Note that if you try this on Windoz, it would be deadly slow. Windoz can not schedule CPU time well. On my Indel i5-3337U laptop, Windoz 10 PRO would take 3 seconds to do the job.

So Pi2 beat i5 with the power from Linux.

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

but on Windows, to show up 1 internet site by Firefox , this lasts just a couple of seconds (depending on internet traffic),

while by identical WiFi stick, at the same time, simultaneously or shortly one after the other, in the same home network, by Iceweasel
on the Raspi it lasts about 1 minute !!!

The same about editing code in geany.

and youtube videos on windows: no hanging up, no jerking.
on Raspi: blocks totally.

Linux power at it's best! #define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

ejolson
Posts: 5605
Joined: Tue Mar 18, 2014 11:47 am

### Re: Having Fun with threads! Very Interesting

allfox wrote:I know it's a C++ forum, excuse me for posting a JavaScript program here
Because this is the C/C++ forum, I have created a non-parallel C code that implements the same algorithm as the JavaScript program. Note that the algorithm used in the JavaScript program is significantly better than the algorithm given in the original post: It saves time by only testing divisibility by prime numbers up to sqrt(n). As the Raspberry Pi2B is a standard hardware platform that allows easy comparison of different programming languages and algorithms, I'll post the resulting code and run times. The code is

Code: Select all

``````#include <stdio.h>
#include <math.h>

#define N 100000

typedef unsigned long int Integer;
static Integer prime;
static int count=0;

static int isprime(Integer n){
int sqrtn=sqrt((double)n)+0.5;
for(int k=0;k<count;k++){
if(prime[k]>sqrtn) return 1;
if(n%prime[k]==0) return 0;
}
return 1;
}

int main(){
for(Integer n=2;n<=N;n++){
if(isprime(n)) prime[count++]=n;
}
printf("Found a total of %d primes!\n",count);
return 0;
}
``````
which can be compiled and executed using the commands

Code: Select all

``````\$ gcc --version
gcc (Debian 4.6.3-14+rpi1) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.

\$ gcc -Wall -std=gnu99 -O3 -o serial serial.c -lm
\$ time ./serial
Found a total of 9592 primes!

real    0m0.105s
user    0m0.100s
sys 0m0.000s
``````
The non-parallel C code runs 18 times faster than the parallel JavaScript code and 314 times faster than the serial version of the original algorithm. It would be interesting whether a 4 times speedup of this baseline timing could be obtained using the 4 cores of the Raspberry Pi2B in parallel. I wonder how large the problem size needs to be before the parallel runtime overhead becomes negligible. Note that the 4 threads of the original program together found only 2399+2399+2409+2384=9591 primes since even numbers were skipped in that algorithm.

davenull
Posts: 1159
Joined: Thu Oct 22, 2015 7:22 am
Location: a small planet close to Betelgeuze

### Re: Having Fun with threads! Very Interesting

so you probably missed to count the "2" additionally, as it's a prime number, even if even number of primes 0-100000 is 9592 #define S sqrt(t+2*i*i)<2
#define F(a,b) for(a=0;a<b;++a)
float x,y,r,i,s,j,t,n;int main(){F(y,64){F(x,99){r=i=t=0;s=x/33-2;j=y/32-1;F(n,50&S){t=r*r-i*i;i=2*r*i+j;r=t+s;}if(S){PointOut(x,y);}}}for(;;);}

allfox
Posts: 452
Joined: Sat Jun 22, 2013 1:36 pm
Location: Guang Dong, China

### Re: Having Fun with threads! Very Interesting

davenull wrote: on Windows, to show up 1 internet site by Firefox , this lasts just a couple of seconds (depending on internet traffic),
on the Raspi it lasts about 1 minute !!! I know that i5 is much more powerful than Pi on every single aspect. However Windoz is still able to lose on a CPU benchmark: prime finder.

It's in fact a known issule with Node on Windoz, Node didn't say why on every platform it works except Windoz.

It looks like that normally Node would give work load to workers in a round-robin fashion, however on Windoz it would let any worker who interested in got the job. It sounds no problem, but in fact only few of all workers would receive job, most just idle.

And I was told that supercomputers don't run Windoz. Most supercomputer don't browse Internet anyway, they work with their CPU.

ejolson
Posts: 5605
Joined: Tue Mar 18, 2014 11:47 am

### Re: Having Fun with threads! Very Interesting

allfox wrote:And I was told that supercomputers don't run Windoz.
If you go to http://www.top500.org/statistics/list/ and select Category->Operating system Family with the mouse, you will discover that as of June 2015 only 0.2% of the computers on the list are running Windows while 97.6% are running Linux. This makes the Raspberry Pi2B, which also runs Linux, good for teaching people who at some future point might enter a field related to science, technology, engineering or mathematics.

Although the ARM CPU of a Raspberry Pi is similar in computational power to the Pentium III machines from around 1999, the skills learned using it are immediately transferable to the highest power supercomputers currently available. In particular, running the serial C program to find prime numbers that took 0.1 seconds on the Raspberry Pi2B yields the following on an old 650Mhz PIII:

Code: Select all

``````\$ time ./serial
Found a total of 9592 primes!

real    0m0.074s
user    0m0.068s
sys 0m0.008s
``````
Note, however, that since the ARM CPU has a frequency governor that doesn't ramp up the clock rate immediately, the originally reported time of 0.1 seconds for the Pi2B was pessimistic. Repeated runs on both architectures obtain an almost identical rate of 0.073 seconds per run.

ejolson
Posts: 5605
Joined: Tue Mar 18, 2014 11:47 am

### Re: Having Fun with threads! Very Interesting

I've created a parallel version of the JavaScript prime finder using the MIT/Intel Cilk extensions to the C99 programming language. As shown at

viewtopic.php?p=843536#p843536

the scaling on 4 CPUs is near linear. The run time to find all the primes up to 100000 is about 0.004 seconds. The difference compared to the run time of 8.871 seconds for the parallel code given in the original post is mostly due to algorithmic changes; the difference compared to the run time of 1.8 seconds for the JavaScript code is mostly due to language efficiency. Note also that the run times in this case were obtained by calling gettimeofday inside the program and as a result don't include program load and initialization overhead.