a quick performance comparison of java on raspbian


36 posts   Page 2 of 2   1, 2
by luzhuomi » Tue Jan 08, 2013 2:22 am
Oracle released a preview with armhf support.
http://jdk8.java.net/fxarmpreview/

This gives us a boost.

Code: Select all
pi@raspberrypi ~/java/testprime/bin $ time java testprime > /dev/null

real   0m5.505s
user   0m4.620s
sys   0m0.170s
Posts: 5
Joined: Thu Dec 27, 2012 4:40 am
by -rst- » Tue Jan 08, 2013 2:13 pm
wombat wrote:As Henrik described above, using std out is a very expensive operation and you're not quite comparing raw cpu power. If you were to remove all calls to printf and compile with -O2, you would get the following

Code: Select all
0.00s user 0.00s system 0% cpu 0.009 total


Did you check the assembly that the optimizer does not just skip all the code?

Maybe something like this would be better (not sure how easy/hard the bit ops are in other than C or JAVA though, if want to make it a really portable test):
Code: Select all
#include  <stdio.h>
int result = 0;
void main() {
    result |= 2;
    for (int i=3;i<=10000;i++) {
      int prime = 1;
      for (int j=2;j<i;j++) {
        int k = i/j;
        int l = k*j;
        if (l==i) prime = 0;
      }
      if (prime) result |= i;
    }
    printf("%i\n", result);
}
http://raspberrycompote.blogspot.com/ - Low-level graphics and 'Coding Gold Dust'
Posts: 898
Joined: Thu Nov 01, 2012 12:12 pm
Location: Dublin, Ireland
by -rst- » Tue Jan 08, 2013 2:50 pm
And to give the Hotspot compiler to chance to do the magic (maybe) something like this (feel free to comment if not enough to activate it):
Code: Select all
public class Prime3 {

  public static void main(String [] args) {
   for (int i = 0; i < 10; i++) {
      test();
   }
  }
 
  public static void test() {
    int result = 0;
    result |= 2;
    for (int i=3;i<=10000;i++) {
      boolean prime = true;
      for (int j=2;j<i;j++) {
        int k = i/j;
        int l = k*j;
        if (l==i) prime = false;
      }
      if (prime) result |= i;
    }
   System.out.println(result);
  }
}

and the same in C
Code: Select all
#include  <stdio.h>
int test() {
   int result = 0;
    result |= 2;
    for (int i=3;i<=10000;i++) {
      int prime = 1;
      for (int j=2;j<i;j++) {
        int k = i/j;
        int l = k*j;
        if (l==i) prime = 0;
      }
      if (prime) result |= i;
    }
    printf("%i\n", result);
}

int main() {
   for (int i = 0; i < 10; i++) {
      test();
   }
}


Interesting results running under Cygwin on my Win7 laptop:
Code: Select all
*** C ***
gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)

--- print
real    0m2.835s !!??
user    0m0.358s
sys     0m0.046s

--- or
real    0m0.441s
user    0m0.374s
sys     0m0.030s

--- 10 * or
real    0m3.691s
user    0m3.556s
sys     0m0.030s

*** Java ***
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) Client VM (build 20.12-b01, mixed mode, sharing)

--- print
real    0m0.545s
user    0m0.015s
sys     0m0.030s

--- or
real    0m0.479s
user    0m0.000s
sys     0m0.031s

--- 10 * or
real    0m3.521s
user    0m0.015s
sys     0m0.015s

...the printf in C under Cygwin must be doing something very odd :shock:
http://raspberrycompote.blogspot.com/ - Low-level graphics and 'Coding Gold Dust'
Posts: 898
Joined: Thu Nov 01, 2012 12:12 pm
Location: Dublin, Ireland
by trouch » Sat Jan 12, 2013 2:39 pm
I've also made my own bench using matrix computation to see float performance
You may take on look on http://www.raspberrypi.org/phpBB3/viewt ... 31&t=29421
All detail on http://trouch.com/2013/01/12/raspberry-pi-benchmark/

WebIOPi - Raspberry Pi REST Framework to control your Pi from the web
http://store.raspberrypi.com/projects/webiopi
http://code.google.com/p/webiopi/
http://trouch.com
Posts: 308
Joined: Fri Aug 03, 2012 7:24 pm
Location: France
by MartinPercival » Sun Jan 13, 2013 1:04 pm
henrik wrote:
Hi, this is not related to loop unrolling which is a specific optimization in the JIT. The general issue you see is that the method can be optimized but the optimized method will not be used until the next invocation. It is technically possible to switch out a method in flight. This is called on stack replacement but is quite complex and is mostly avoided since it provides limited benefit and is expensive to maintain. Plenty of good literature out there. My favorite is the Oracle JRockit book, which is a deep dive into how JVMs work (different JVM, same principles, and Hotspot and JRockit are being merged by us so valuable insights anyway). You can find it on Amazon.

There are some reasonable benchmarks out there. Try SPECjvm2008, it is fairly comprehensive.

For a good microbenchmark, try this:

main
while (true)
read time
call benchmark method
read time, print out difference and result
end while

benchmark method
run benchmark, maybe lots of times
store intermediary results in a public variable to avoid the optimizer removing the code you want to measure
return result


Hi Henrik, (I'm guessing it's Henrik Stahl, right?) how's life since the Oracle takeover? :D

Does the new merged Jrockit/Hotspot implementation generate ARM optimised bytecode - so, for example does it know about all the conditional branch instructions? I could imagine that in a synthetic benchmark like this, that a loop unrolling vs conditional brach decision might make a big difference over the lifetime of this loop.

Martin

PS Guys (and gals), Henrik was one of the folks who built the JRockit JVM so he's a goldmine for this kind of stuff....sorry Henrik, stitched you up there :)
Posts: 1
Joined: Sun Jan 13, 2013 12:50 pm
by henrik » Mon Jan 14, 2013 12:33 am
MartinPercival wrote:
henrik wrote:
Hi, this is not related to loop unrolling which is a specific optimization in the JIT. The general issue you see is that the method can be optimized but the optimized method will not be used until the next invocation. It is technically possible to switch out a method in flight. This is called on stack replacement but is quite complex and is mostly avoided since it provides limited benefit and is expensive to maintain. Plenty of good literature out there. My favorite is the Oracle JRockit book, which is a deep dive into how JVMs work (different JVM, same principles, and Hotspot and JRockit are being merged by us so valuable insights anyway). You can find it on Amazon.

There are some reasonable benchmarks out there. Try SPECjvm2008, it is fairly comprehensive.

For a good microbenchmark, try this:

main
while (true)
read time
call benchmark method
read time, print out difference and result
end while

benchmark method
run benchmark, maybe lots of times
store intermediary results in a public variable to avoid the optimizer removing the code you want to measure
return result


Hi Henrik, (I'm guessing it's Henrik Stahl, right?) how's life since the Oracle takeover? :D

Does the new merged Jrockit/Hotspot implementation generate ARM optimised bytecode - so, for example does it know about all the conditional branch instructions? I could imagine that in a synthetic benchmark like this, that a loop unrolling vs conditional brach decision might make a big difference over the lifetime of this loop.

Martin

PS Guys (and gals), Henrik was one of the folks who built the JRockit JVM so he's a goldmine for this kind of stuff....sorry Henrik, stitched you up there :)


Hi Martin,

Yes, same Henrik. Thanks for the kind words! The Oracle JVM is based on Hotspot with features from JRockit ported over. The port to Linux/ARM includes a JIT is highly optimized for ARM, which means it does many of the optimizations you would find in a textbook on the subject and then some. I'm not the least surprised that people report results similar to and in some cases exceeding gcc - if you were to look at the generated assembly code you would see that it is much the same. A JVM has some benefits over a static compiler in that it knows the performance characteristics of the current executing program when it performs the compilation, so it can do some tricks such as adjusting the memory layout to avoid CPU cache misses etc. On the other hand, it has some drawbacks such as slower startup and JNI overhead for calling native libraries. Net-net for a non-trivial program the performance tends to be approximately the same as for a normal C program compiled with gcc. But for microbenchmarks C tends to perform better (unless the Java coder takes care to avoid common pitfalls such as the lack of on stack replacement).

Given the maturity of our Java port to ARM and the Java toolchain, and the speed advantage over Python it would IMHO be good for everyone if Java had a more prominent place in the RPi community.

Henrik
Posts: 63
Joined: Tue Dec 18, 2012 4:24 pm
by trouch » Mon Jan 14, 2013 10:34 am
henrik wrote:Given the maturity of our Java port to ARM and the Java toolchain, and the speed advantage over Python it would IMHO be good for everyone if Java had a more prominent place in the RPi community.


I totally agree !

But even if I am a C/Java developer prior to Python, this one has some advantages for a totally novice : interpreter mode, no compilation required, object oriented coding is optional, and ease to do many things. I may say Python should be followed by Java from a learning pov if syntax was not so different.
So people should directly start with Java, but I'm not sure the Pi will handle a full Eclipse or Netbeans. It requires another computer but there is others issues, that can be solved, but at the end, the setup will be more complex for a novice.

Choosing a first programming language to learn is still not easy...

But for intermediate and advanced developers, Java is definitively a very nice choice for the Pi.

WebIOPi - Raspberry Pi REST Framework to control your Pi from the web
http://store.raspberrypi.com/projects/webiopi
http://code.google.com/p/webiopi/
http://trouch.com
Posts: 308
Joined: Fri Aug 03, 2012 7:24 pm
Location: France
by jvdm » Thu Jan 31, 2013 2:21 pm
I'm happy to hear that there are developments being made with Java on the R-Pi platform. Something that I'm very interested in is running Processing on the R-Pi. I think it's a wonderful educational environment, and if it can perform well it would be ideal for kids etc. looking for an easy introduction into software development.
Posts: 1
Joined: Thu Jan 31, 2013 2:14 pm
by bkaindl » Sun May 26, 2013 6:42 pm
Unfortunately, this trivial program does not give a valid Java performance comparison on the Pi.
It's not a useful Java (micro)benchmark to give some meaningful numbers for Java performance on the Pi, because it uses integer division in it's hot loop!
Integer division, you ask? What's so special about it.
Well, the Pi uses an ARM11 core, and ARM11 cores don't implement Integer division. Yes, indeed. It does not.
This means this micro-benchmark mostly measures the integer division function which implements integer division in software which can take up to 100 instructions or cycles per division.
This means: On the CPU level, the hot loop of this benchmark is dominated by an integer division, carried out in software (emulated), the rest does not take a lot of time.

This is the explanation why the native assembly code generated by gcc is not much faster than JamVM:
Because however code is generated or interpreted, the run time of the loop is dominated by the integer division.
Of course, adding more other code will make the road block put in front of the CPU on each loop smaller, but unless a "realistic" benchmark load is carefully chosen, benchmarks and especially micro-benchmark like these do not give meaningful numbers.

For example, here is another JVM micro-benchmark, this time it is banging on a GPIO register (and not using division in the process if doing that):
https://blogs.oracle.com/jtc/entry/comparing_jvms_on_arm_linux
Java + OpenJDK 7 + ZeroVM = 5 kHz
Java + OpenJDK 7 + JamVM = 10.75 kHz
Java + OpenJDK 7 + Avian = Error; untested
Java + OracleJDK 8 (ea) + HotspotVM = 153 kHz
Java + OracleJDK 7u10 + HotspotVM = 161 kHz (on soft-float Debian "Wheezy")
Native C + WiringPi = 7 MHz

As can be seen, in that micro-benchmark, native C can be nearly 1000 times faster than JamVM. (Of course it is not, especially not as soon as divisions take a significant performance of time in the test)
This test is again not representative, because exclusively banging an GPIO is just one metric of may which may ordinarly use of "Java" on the Pi - is really depends.
Therefore, there can be no "Java" or "JVM" performance comparsion, like comparing the speed of two cars on a straight road. Only the results of individual benchmarks can be compared, and the results can only be applied to a given domain if the benchmarks really reflect that domain.
Ensuring that can only be done by those who evaluate benchmark results, and for that, don't trust anyone else...
Posts: 1
Joined: Sun May 26, 2013 5:53 pm
by bullen » Tue May 28, 2013 11:34 pm
henrik wrote:Given the maturity of our Java port to ARM and the Java toolchain, and the speed advantage over Python it would IMHO be good for everyone if Java had a more prominent place in the RPi community.


Exactly! Also it would be great if Oracle could organize a appserver competition on the RPi, since it allows for common ground comparison.
Posts: 42
Joined: Sun Apr 28, 2013 2:52 pm
by peteralex63 » Thu Jul 10, 2014 7:47 am
I've tried this using the following node.js code:

Code: Select all
console.log("2");
for (var i=3; i<10001; i++){
   prime=true;
   for (var j=2; j<i; j++){
      var k = i/j>>0;
      var l = k * j;
      if (l==i) {
         prime = false;
      }
   }
   if (prime) {
      console.log(i);
   }
}


Performed quite well:

real 0m22.831s
user 0m20.610s
sys 0m0.700s
Posts: 1
Joined: Thu Jul 10, 2014 7:44 am