http://jdk8.java.net/fxarmpreview/
This gives us a boost.
- Code: Select all
pi@raspberrypi ~/java/testprime/bin $ time java testprime > /dev/null
real 0m5.505s
user 0m4.620s
sys 0m0.170s
pi@raspberrypi ~/java/testprime/bin $ time java testprime > /dev/null
real 0m5.505s
user 0m4.620s
sys 0m0.170s
wombat wrote:As Henrik described above, using std out is a very expensive operation and you're not quite comparing raw cpu power. If you were to remove all calls to printf and compile with -O2, you would get the following
- Code: Select all
0.00s user 0.00s system 0% cpu 0.009 total
#include <stdio.h>
int result = 0;
void main() {
result |= 2;
for (int i=3;i<=10000;i++) {
int prime = 1;
for (int j=2;j<i;j++) {
int k = i/j;
int l = k*j;
if (l==i) prime = 0;
}
if (prime) result |= i;
}
printf("%i\n", result);
}
public class Prime3 {
public static void main(String [] args) {
for (int i = 0; i < 10; i++) {
test();
}
}
public static void test() {
int result = 0;
result |= 2;
for (int i=3;i<=10000;i++) {
boolean prime = true;
for (int j=2;j<i;j++) {
int k = i/j;
int l = k*j;
if (l==i) prime = false;
}
if (prime) result |= i;
}
System.out.println(result);
}
}
#include <stdio.h>
int test() {
int result = 0;
result |= 2;
for (int i=3;i<=10000;i++) {
int prime = 1;
for (int j=2;j<i;j++) {
int k = i/j;
int l = k*j;
if (l==i) prime = 0;
}
if (prime) result |= i;
}
printf("%i\n", result);
}
int main() {
for (int i = 0; i < 10; i++) {
test();
}
}
*** C ***
gcc (GCC) 3.4.4 (cygming special, gdc 0.12, using dmd 0.125)
--- print
real 0m2.835s !!??
user 0m0.358s
sys 0m0.046s
--- or
real 0m0.441s
user 0m0.374s
sys 0m0.030s
--- 10 * or
real 0m3.691s
user 0m3.556s
sys 0m0.030s
*** Java ***
java version "1.6.0_37"
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) Client VM (build 20.12-b01, mixed mode, sharing)
--- print
real 0m0.545s
user 0m0.015s
sys 0m0.030s
--- or
real 0m0.479s
user 0m0.000s
sys 0m0.031s
--- 10 * or
real 0m3.521s
user 0m0.015s
sys 0m0.015s
henrik wrote:
Hi, this is not related to loop unrolling which is a specific optimization in the JIT. The general issue you see is that the method can be optimized but the optimized method will not be used until the next invocation. It is technically possible to switch out a method in flight. This is called on stack replacement but is quite complex and is mostly avoided since it provides limited benefit and is expensive to maintain. Plenty of good literature out there. My favorite is the Oracle JRockit book, which is a deep dive into how JVMs work (different JVM, same principles, and Hotspot and JRockit are being merged by us so valuable insights anyway). You can find it on Amazon.
There are some reasonable benchmarks out there. Try SPECjvm2008, it is fairly comprehensive.
For a good microbenchmark, try this:
main
while (true)
read time
call benchmark method
read time, print out difference and result
end while
benchmark method
run benchmark, maybe lots of times
store intermediary results in a public variable to avoid the optimizer removing the code you want to measure
return result
MartinPercival wrote:henrik wrote:
Hi, this is not related to loop unrolling which is a specific optimization in the JIT. The general issue you see is that the method can be optimized but the optimized method will not be used until the next invocation. It is technically possible to switch out a method in flight. This is called on stack replacement but is quite complex and is mostly avoided since it provides limited benefit and is expensive to maintain. Plenty of good literature out there. My favorite is the Oracle JRockit book, which is a deep dive into how JVMs work (different JVM, same principles, and Hotspot and JRockit are being merged by us so valuable insights anyway). You can find it on Amazon.
There are some reasonable benchmarks out there. Try SPECjvm2008, it is fairly comprehensive.
For a good microbenchmark, try this:
main
while (true)
read time
call benchmark method
read time, print out difference and result
end while
benchmark method
run benchmark, maybe lots of times
store intermediary results in a public variable to avoid the optimizer removing the code you want to measure
return result
Hi Henrik, (I'm guessing it's Henrik Stahl, right?) how's life since the Oracle takeover?![]()
Does the new merged Jrockit/Hotspot implementation generate ARM optimised bytecode - so, for example does it know about all the conditional branch instructions? I could imagine that in a synthetic benchmark like this, that a loop unrolling vs conditional brach decision might make a big difference over the lifetime of this loop.
Martin
PS Guys (and gals), Henrik was one of the folks who built the JRockit JVM so he's a goldmine for this kind of stuff....sorry Henrik, stitched you up there
henrik wrote:Given the maturity of our Java port to ARM and the Java toolchain, and the speed advantage over Python it would IMHO be good for everyone if Java had a more prominent place in the RPi community.