a quick performance comparison of java on raspbian


37 posts   Page 1 of 2   1, 2
by plugwash » Wed Dec 12, 2012 2:23 am
I decided to do a little test to find out just how good or bad the java performance on the Pi really was compared to other languages. So I wrote a little program that calculates prime numbers up to 10000 in a really dumb way. For comparison I also translated this program into c and php (note that these tests were run on my Pi which happens to be running an older raspbian install but the java packages used were the latest from the raspbian repo).

openjdk-6 zero
real 0m29.014s
user 0m26.070s
sys 0m0.660s
root@raspbian:~#

openjdk-6 jamvm
real 0m16.068s
user 0m13.800s
sys 0m0.600s

openjdk-7 zero
real 0m28.568s
user 0m25.980s
sys 0m0.820s

openjdk-7 jamvm
real 0m15.511s
user 0m13.270s
sys 0m0.650s

openjdk-7 avian
real 0m11.701s
user 0m9.620s
sys 0m0.550s

gcc 4,6
real 0m6.882s
user 0m5.250s
sys 0m0.390s

gcc 4.6 -O2
real 0m5.360s
user 0m3.790s
sys 0m0.370s

gcc 4.7
real 0m6.884s
user 0m5.140s
sys 0m0.510s

gcc 4.7 -O2
real 0m5.310s
user 0m3.560s
sys 0m0.540s

php
real 1m31.333s
user 1m28.860s
sys 0m0.590s

I think this shows that contary to some peoples assertions java on raspbian is a perfectly usable language performance wise.

Code: Select all
public class testprime {
  public static void main(String [] args) {
    System.out.println(2);
    for (int i=3;i<=10000;i++) {
      boolean prime = true;
      for (int j=2;j<i;j++) {
        int k = i/j;
        int l = k*j;
        if (l==i) prime = false;
      }
      if (prime) System.out.println(i);
    }
  }
}


Code: Select all
#include  <stdio.h>
  void main() {
    printf("2\n");
    for (int i=3;i<=10000;i++) {
      int prime = 1;
      for (int j=2;j<i;j++) {
        int k = i/j;
        int l = k*j;
        if (l==i) prime = 0;
      }
      if (prime) printf("%i\n",i);
    }
  }


Code: Select all
<?php
    print("2\n");
    for ($i=3;$i<=10000;$i++) {
      $prime = true;
      for ($j=2;$j<$i;$j++) {
        $k = (int)($i/$j);
        $l = $k*$j;
        if ($l==$i) $prime = false;
      }
      if ($prime) print($i."\n");
    }
?>

Forum Moderator
Forum Moderator
Posts: 3152
Joined: Wed Dec 28, 2011 11:45 pm
by chriswhocodes » Wed Dec 12, 2012 9:20 am
Hi plugwash,

Thanks for this, I'm very interested in the Pi's capabilities for Java.

I'll try running this on softfloat Debian later with the Oracle VM and post results.

Do the results change if you run the benchmark multiple times (to make sure HotSpot has the best chance of optimising) ?

Cheers,

Chris
@chriswhocodes
http://www.chrisnewland.com/raspberrypi
User avatar
Posts: 35
Joined: Mon May 21, 2012 11:26 am
Location: London, UK
by caldimerda » Wed Dec 12, 2012 10:24 am
Thanks for those benchmarks. I entirely agree with your conclusion. My first Raspi sat on a shelf for 3 months because the forum seemed rather down on Java and I thought I'd be wasting my time. But not now.

My app has five modules that communicate via unicast UDP - so they can run on one box or five separate ones. They all have a GUI component and I run them from a remote X session using NX server. With the soft float Raspian and the current Oracle JVM performance was poor - my Raspi kept overclocking. I now use OpenJDK 7 with Jamvm and I can run three of them (one for a user interface, one for file IO and one for reading data from the serial port) on one Raspi (so three JVMs and three GUIs) and it all stays at 700 mhz on a 256M Raspi.

The only module that seems to use more CPU than I'd expect is slow when using the AWT to generate thumbnails from larger JPEGs. I only ported that to a Raspi last week so haven't got to the bottom of it yet.

Many thanks to everyone posting about Java on the forum - loads of useful stuff. And even more thanks to Robert Lougher for the Jamvm.

And thanks James et al for the new Java sub-forum.
Posts: 50
Joined: Tue Oct 09, 2012 9:41 am
by KenT » Wed Dec 12, 2012 11:15 am
Interesting benchmarks, I wrote a bit of Java when it first got popular just to see what all the fuss was about, nice language.

I seem to remember one of the reasons given for the 512Mb memory was that it helps Java. What size memory was the benchmark Pi.

It would be great to see the benchmark re-written in Python but I'm too lazy/busy to do it ;) I suspect the speed will be similar to php
Pi Presents - A toolkit to produce multi-media interactive display applications for museums, visitor centres, and more
Download from http://pipresents.wordpress.com
Posts: 759
Joined: Tue Jan 24, 2012 9:30 am
Location: Hertfordshire, UK
by poglad » Wed Dec 12, 2012 11:55 am
For the purposes of this thread, that's not a "really dumb way" it's an ideal way, because it can easily be translated into different languages. You could start using algorithms and library features that will be better suited to each language, but then you're comparing apples and oranges and it loses its value.
User avatar
Posts: 100
Joined: Tue Jul 31, 2012 8:47 am
Location: Aberdeen, Scotland
by -rst- » Wed Dec 12, 2012 12:07 pm
Might be interesting test to see how much the output affects the timings - comment out the print lines and rerun...?

(I program 99% in Java at work, but don't yet have it installed on RPi)

Regards,

JP
http://raspberrycompote.blogspot.com/ - Low-level graphics and 'Coding Gold Dust'
Posts: 1315
Joined: Thu Nov 01, 2012 12:12 pm
Location: Dublin, Ireland
by chriswhocodes » Wed Dec 12, 2012 12:21 pm
Hi JP,

You might need to check the bytecode to make sure the program is still doing the work.

Without the System.out.println I think an intelligent javac could optimise out quite a lot here ;)
@chriswhocodes
http://www.chrisnewland.com/raspberrypi
User avatar
Posts: 35
Joined: Mon May 21, 2012 11:26 am
Location: London, UK
by trouch » Wed Dec 12, 2012 1:31 pm
thanks for that compare !
I had in mind to make the same thing next days, no more need now ;)
can you add oracle jdk results ?
also, as python is recommended by the foundation, it should be interesting too.

WebIOPi - Raspberry Pi REST Framework to control your Pi from the web
http://store.raspberrypi.com/projects/webiopi
http://code.google.com/p/webiopi/
http://trouch.com
Posts: 310
Joined: Fri Aug 03, 2012 7:24 pm
Location: France
by -rst- » Wed Dec 12, 2012 2:12 pm
chriswhocodes wrote:Hi JP,

You might need to check the bytecode to make sure the program is still doing the work.

Without the System.out.println I think an intelligent javac could optimise out quite a lot here ;)


Hi Chris,

Yep, that might of course become a problem - same with the C compiler (at least some commercial compilers did some pretty interesting optimisations back in the day).

In my previous (work) life I did some comparisons between Java and C on Windows PCs and found the number-crunching to be very close to same - only (graphical) screen handling being notably slower on Java ...and I have a faint recollection I did experiment a bit with the console (debug) output as well...

Maybe something like sum the numbers up instead of the print and output the sum at end?

(Would do this my self, but the security policy is quite strict here at work and I an not allowed to plug my RPi to the network, so have to wait until the evening to install Java :oops: )

Regards,

JP
http://raspberrycompote.blogspot.com/ - Low-level graphics and 'Coding Gold Dust'
Posts: 1315
Joined: Thu Nov 01, 2012 12:12 pm
Location: Dublin, Ireland
by Kalimar » Wed Dec 12, 2012 2:21 pm
Could you do the comparison for python as well? Would be interesting...
Posts: 5
Joined: Thu Nov 08, 2012 10:23 am
by -rst- » Wed Dec 12, 2012 2:40 pm
I assume the Python code would be like this:

Code: Select all
print "2"
for i in range(3,10001):
    prime = 1
    for j in range(2, i):
        k = i/j
        l = k*j
        if l == i:
            prime = 0
    if prime == 1:
        print i


Save that to file prime.py - then run as:

Code: Select all
time python prime.py


On my Rev 2/512 Mb with default clocking running Raspbian 2012-10-28:

Code: Select all
real    6m53.048s
user    6m49.040s
sys     0m3.210s


Now that was a coffee break well spent :ugeek:

Regs,

JP (complete Python n00b)
http://raspberrycompote.blogspot.com/ - Low-level graphics and 'Coding Gold Dust'
Posts: 1315
Joined: Thu Nov 01, 2012 12:12 pm
Location: Dublin, Ireland
by plugwash » Wed Dec 12, 2012 3:16 pm
chriswhocodes wrote:Do the results change if you run the benchmark multiple times (to make sure HotSpot has the best chance of optimising) ?

IIRC hotspot only optimises within a session not between sessions but in any case it doesn't have a jit available anyway (that is why it is so much slower than avian and jamvm)

KenT wrote:I seem to remember one of the reasons given for the 512Mb memory was that it helps Java. What size memory was the benchmark Pi.

256MB though I doubt it makes much difference in this test.
Forum Moderator
Forum Moderator
Posts: 3152
Joined: Wed Dec 28, 2011 11:45 pm
by chriswhocodes » Wed Dec 12, 2012 8:45 pm
Interesting result for softfloat:

Fresh 2012-08-08-wheezy-armel

Oracle ejre1.7.0_10 ARMv6/7 Linux - Headless EABI, VFP, SoftFP ABI, Little Endian1
http://www.oracle.com/technetwork/java/embedded/downloads/javase/index.html

./ejre1.7.0_10/bin/java -version

java version "1.7.0_10"
Java(TM) SE Embedded Runtime Environment (build 1.7.0_10-b18 headless)
Java HotSpot(TM) Embedded Client VM (build 23.6-b04, mixed mode)

real 0m7.942s
user 0m5.200s
sys 0m2.710s

So for this little synthetic benchmark, Oracle's embedded VM with HotSpot beats all the OpenJDKs and is very close to C.

Edit - The above figures are for Pi @ 700MHz (no overclock)
Edit 2 - The Oracle JRE is just a JRE. The class was compiled on OpenJDK 1.6.0_24 if that makes a difference.
@chriswhocodes
http://www.chrisnewland.com/raspberrypi
User avatar
Posts: 35
Joined: Mon May 21, 2012 11:26 am
Location: London, UK
by -rst- » Fri Dec 14, 2012 12:12 pm
Now that is pretty much what I would have expected based on comparisons on PCs!
http://raspberrycompote.blogspot.com/ - Low-level graphics and 'Coding Gold Dust'
Posts: 1315
Joined: Thu Nov 01, 2012 12:12 pm
Location: Dublin, Ireland
by trouch » Tue Dec 18, 2012 9:42 am
See Comparing JVMs on ARM/Linux
Not surprised with the difference between OpenJDK and Oracle JVM
But I'm very surprised that soft float oracle's jvm is near GCC.
Does GCC use soft or hard float ABI ?

WebIOPi - Raspberry Pi REST Framework to control your Pi from the web
http://store.raspberrypi.com/projects/webiopi
http://code.google.com/p/webiopi/
http://trouch.com
Posts: 310
Joined: Fri Aug 03, 2012 7:24 pm
Location: France
by plugwash » Tue Dec 18, 2012 11:08 am
Remeber this test program is pure integer so it shouldn't matter if it's built for soft float or hard float.
Forum Moderator
Forum Moderator
Posts: 3152
Joined: Wed Dec 28, 2011 11:45 pm
by trouch » Tue Dec 18, 2012 1:21 pm
plugwash wrote:Remeber this test program is pure integer so it shouldn't matter if it's built for soft float or hard float.


yes but... I just made few tests with a fresh wheezy-raspbian and a fresh wheezy-armel
and guess what ? armel take 25% more time than raspbian...

WebIOPi - Raspberry Pi REST Framework to control your Pi from the web
http://store.raspberrypi.com/projects/webiopi
http://code.google.com/p/webiopi/
http://trouch.com
Posts: 310
Joined: Fri Aug 03, 2012 7:24 pm
Location: France
by chriswhocodes » Wed Dec 19, 2012 9:54 pm
fyi,

jdk1.8.0 early access on fully patched Raspbian hardfloat gives the result:

Code: Select all
real    0m6.838s
user    0m5.260s
sys     0m2.540s

JRE identifies as

Code: Select all
java version "1.8.0-ea"
Java(TM) SE Runtime Environment (build 1.8.0-ea-b36e)
Java HotSpot(TM) Client VM (build 25.0-b04, mixed mode)
@chriswhocodes
http://www.chrisnewland.com/raspberrypi
User avatar
Posts: 35
Joined: Mon May 21, 2012 11:26 am
Location: London, UK
by henrik » Thu Dec 20, 2012 2:06 am
Hi, at least the Java version of this microbenchmark is broken. The problem is that the JIT compiler will never aggressively optimize the main loop since it doesn't do on stack replacement. That is, it can optimize the main method but the optimized main method will not be executed until the next time it is invoked (which is never). Also, the most expensive operation is likely the println. So you are measuring something but probably not what you are looking for.

One possible patch is to rename main to main2 and the have a while(true) loop in main that calls main2 over and over again, measuring the time of the main2 invocations. You will find that the first invocation is somewhat slow and then a leap in invocation speed once the server compiler kicks in.

Oh, and run with "java -server MyBenchmark" for the fast server compiler.

Henrik
Posts: 65
Joined: Tue Dec 18, 2012 4:24 pm
by chriswhocodes » Thu Dec 20, 2012 9:55 am
Hi Henrik,

That's very interesting. I must swot up on how the HotSpot JIT compiler works as I'd assumed it might be able to do some loop unrolling.

Agree the println is going to be the bottleneck and also microbenchmarks are of limited use but I'm interested in squeezing every last drop of Java performance out of the Pi so will have a go at creating a suite of benchmarks that people can use to compare the various Java flavours on the Pi.
@chriswhocodes
http://www.chrisnewland.com/raspberrypi
User avatar
Posts: 35
Joined: Mon May 21, 2012 11:26 am
Location: London, UK
by henrik » Thu Dec 20, 2012 2:59 pm
chriswhocodes wrote:Hi Henrik,

That's very interesting. I must swot up on how the HotSpot JIT compiler works as I'd assumed it might be able to do some loop unrolling.

Agree the println is going to be the bottleneck and also microbenchmarks are of limited use but I'm interested in squeezing every last drop of Java performance out of the Pi so will have a go at creating a suite of benchmarks that people can use to compare the various Java flavours on the Pi.


Hi, this is not related to loop unrolling which is a specific optimization in the JIT. The general issue you see is that the method can be optimized but the optimized method will not be used until the next invocation. It is technically possible to switch out a method in flight. This is called on stack replacement but is quite complex and is mostly avoided since it provides limited benefit and is expensive to maintain. Plenty of good literature out there. My favorite is the Oracle JRockit book, which is a deep dive into how JVMs work (different JVM, same principles, and Hotspot and JRockit are being merged by us so valuable insights anyway). You can find it on Amazon.

There are some reasonable benchmarks out there. Try SPECjvm2008, it is fairly comprehensive.

For a good microbenchmark, try this:

main
while (true)
read time
call benchmark method
read time, print out difference and result
end while

benchmark method
run benchmark, maybe lots of times
store intermediary results in a public variable to avoid the optimizer removing the code you want to measure
return result
Posts: 65
Joined: Tue Dec 18, 2012 4:24 pm
by pwinwood » Thu Dec 20, 2012 10:15 pm
I know this thread is mostly about java but I thought I would test out LuaJIT 2.0 which supports armhf.
This is the Lua code

Code: Select all
print(2)
for i = 3, 10000 do
        local prime = true
        for j = 2, i-1 do
                local k = math.floor(i / j)
                local l = k * j
                if l == i then
                        prime = false
                end
        end
        if prime then
                print(i)
        end
end


The machine is an non-overclocked raspberry pi running Raspbian
Linux raspberrypi 3.2.27+ #307 PREEMPT Mon Nov 26 23:22:29 GMT 2012 armv6l GNU/Linux

The benchmark results are

real 0m11.376s
user 0m9.430s
sys 0m1.920s


My display is 1280x1024 would that make a difference?

Edit: To answer my own question: If I redirect the stdout then the sys drops to
0m0.010s but the user stays the same.
Posts: 69
Joined: Mon Jul 02, 2012 2:21 am
Location: Oxford, England
by luzhuomi » Thu Dec 27, 2012 4:42 am
Haskell via ghc7.4.1

Code: Select all
main :: IO ()
main = do 
  { print "2"
  ; mapM_ (\i ->
            let bs = map (\j ->
                           let k = i `div` j
                               l = k*j
                           in (k*j == i)
                         ) [2..i-1]
            in if all (False ==) bs
               then print (show i)
               else return ()
          )
    [3..10000]
  }


compile using ghc -O2 --make
Code: Select all
real   0m6.812s
user   0m6.750s
sys   0m0.030s
Posts: 6
Joined: Thu Dec 27, 2012 4:40 am
by madscientistCL » Mon Jan 07, 2013 9:20 pm
Great post and great experiments. Good information.

I have translated this code into assembly. Here are the results

real 0m3.025s
user 0m2.987s
sys 0m0.192s

Translating it into assembly wasn't trivial and the code is very lengthy and I am sure it can be optimized much more. I am going to post the code when I've cleaned it up a bit.

If someone wants to give it a try, remember the RPi is ARMv6.
Posts: 7
Joined: Sun Dec 09, 2012 10:12 pm
by wombat » Mon Jan 07, 2013 9:32 pm
As Henrik described above, using std out is a very expensive operation and you're not quite comparing raw cpu power. If you were to remove all calls to printf and compile with -O2, you would get the following

Code: Select all
0.00s user 0.00s system 0% cpu 0.009 total
Posts: 1
Joined: Sun Jan 06, 2013 4:24 pm