Profiling on the Pi?


15 posts
by jlongstreet » Wed Oct 03, 2012 8:40 pm
What if any profiling tools have people had success with?

I'd like to do some performance optimizations for emulators running on the Pi. I've never done CPU profiling in Linux before, but I'm intermediate-level experienced with Linux. I'm perfectly comfortable recompiling or making intrusive source changes if necessary, but I'd like as simple of a workflow as possible.

Sampling-based profiling tools are fine, as well as instrumentation-based tools. Doing a quick search on the forums it seems that some people have done profiling on various applications, but there's no mention of the tools they used (other than valgrind not working).
Posts: 31
Joined: Wed Sep 05, 2012 2:59 am
by Narishma » Thu Oct 04, 2012 7:51 pm
Have you tried OProfile?
Posts: 150
Joined: Wed Nov 23, 2011 1:29 pm
by jlongstreet » Tue Oct 09, 2012 4:04 pm
I got oprofile building yesterday, but when I try to run operf:
Code: Select all
pi@raspberrypi:~/oprofile-0.9.8/pe_profiling$ operf
Unexpected error running operf: No such device
Please use the opcontrol command instead of operf.


main() calls _check_perf_events_cap(), which does a __NR_perf_event_open syscall. It returns ENODEV, which apparently was not expected by oprofile. I'm guessing this means that the kernel doesn't provide support for perf_events, although /proc/sys/kernel/perf_event_paranoid exists.

Any idea how I can add support for this? I'm not opposed to recompiling my kernel.
Posts: 31
Joined: Wed Sep 05, 2012 2:59 am
by jamieiles » Tue Oct 09, 2012 4:57 pm
jlongstreet wrote:I got oprofile building yesterday, but when I try to run operf:
Code: Select all
pi@raspberrypi:~/oprofile-0.9.8/pe_profiling$ operf
Unexpected error running operf: No such device
Please use the opcontrol command instead of operf.


main() calls _check_perf_events_cap(), which does a __NR_perf_event_open syscall. It returns ENODEV, which apparently was not expected by oprofile. I'm guessing this means that the kernel doesn't provide support for perf_events, although /proc/sys/kernel/perf_event_paranoid exists.

Any idea how I can add support for this? I'm not opposed to recompiling my kernel.


The ARM PMU should have an IRQ that needs to be wired up to an "arm-pmu" platform device, see arch/arm/mach-bcmring/arch.c as an example. I can't see the PMU IRQ in the arm peripherals datasheet though so it's either not wired up to the controllers or just not documented...

Jamie
Posts: 3
Joined: Wed Sep 19, 2012 10:38 am
by dom » Wed Oct 10, 2012 9:31 am
jamieiles wrote:The ARM PMU should have an IRQ that needs to be wired up to an "arm-pmu" platform device, see arch/arm/mach-bcmring/arch.c as an example. I can't see the PMU IRQ in the arm peripherals datasheet though so it's either not wired up to the controllers or just not documented...


Unfortunately the interrupt is not available.
I believe the wraparound time for the PMU resisters is quite high, so it may be possibly to get the same effect from a timer.
Moderator
Posts: 3858
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by robquant » Wed Oct 10, 2012 1:58 pm
What about the gpertools: http://code.google.com/p/gperftools/
They contain a sampling profiler that seems to work nicely on the Pi.
Posts: 3
Joined: Wed Oct 10, 2012 1:55 pm
Location: Berlin
by jlongstreet » Wed Oct 10, 2012 4:22 pm
Yeah, it looks like OProfile is too much trouble for the moment. I'm building gperftools now.

I needed to add "-march=armv7-a" to CFLAGS and CXXFLAGS to get it compiling, because Debian armhf apparently targets ARMv4 out of the box. I'm going to post a bug on the google code page about that.

Assuming gperftools works, it looks like it'll do what I want. I'll post back here with updates.
Posts: 31
Joined: Wed Sep 05, 2012 2:59 am
by jlongstreet » Wed Oct 10, 2012 5:06 pm
Hmm, it seems that the memory barriers are still not supported, even after compiling for -march=armv7-a:

Code: Select all
pi@raspberrypi:~/profiletest$ LD_LIBRARY_PATH=/usr/local/lib gdb ./test
GNU gdb (GDB) 7.4.1-debian
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "arm-linux-gnueabihf".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/pi/profiletest/test...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/pi/profiletest/test

Program received signal SIGILL, Illegal instruction.
Acquire_CompareAndSwap (new_value=1, ptr=0x11024, old_value=0)
    at ./src/base/atomicops-internals-arm-v6plus.h:138
138       MemoryBarrier();
(gdb)
Posts: 31
Joined: Wed Sep 05, 2012 2:59 am
by dom » Wed Oct 10, 2012 5:12 pm
jlongstreet wrote:Hmm, it seems that the memory barriers are still not supported, even after compiling for -march=armv7-a:

Why are you building for armv7-a? We're an armv6.
Moderator
Posts: 3858
Joined: Wed Aug 17, 2011 7:41 pm
Location: Cambridge
by cleverca22 » Thu Oct 11, 2012 12:54 am
ive gotten oprofile to work before, i think i had to compile the kernel module for it to work

but sadly, all the performance monitoring stuff didnt work, the irq never triggered
i had to force a module option to make it use the timer tick irq
Posts: 166
Joined: Sat Aug 18, 2012 2:33 pm
by robquant » Thu Oct 11, 2012 11:30 am
I use gperftools from the Arch repos and this works nicely:

Code: Select all
[mypi] ~ %CPUPROFILE=ls.prof LD_PRELOAD=/usr/lib/libprofiler.so /bin/ls -R >| /dev/null         
[mypi] ~ %pprof --text /bin/ls ls.prof | head
Using local file /bin/ls.
Using local file ls.prof.
Total: 324 samples
      81  25.0%  25.0%       81  25.0% __getdents64
      66  20.4%  45.4%       66  20.4% __openat_nocancel
      55  17.0%  62.3%       55  17.0% strcoll_l
       8   2.5%  64.8%        8   2.5% _int_malloc
       7   2.2%  67.0%        7   2.2% __fxstat64
       5   1.5%  68.5%        5   1.5% _int_free
       5   1.5%  70.1%        5   1.5% closedir
       5   1.5%  71.6%        5   1.5% readdir64@@GLIBC_2.4
       4   1.2%  72.8%        4   1.2% __aeabi_read_tp
Posts: 3
Joined: Wed Oct 10, 2012 1:55 pm
Location: Berlin
by dozencrows » Sun Oct 28, 2012 3:25 pm
It seems there have been problems building tcmalloc (used in gperftools) for ARM in the past - e.g. http://code.google.com/p/chromium-os/is ... l?id=34620. So it may not be in complete working order for all ARM CPUs.

There is an implementation with gperftools but it doesn't seem to be working for all flavours of ARM (hence the 'dmb' instruction compilation error).

After some inspection of the code, I was able to make a small kludge that got tcmalloc to build for ARMV5, which enabled me to successfully profile a small executable on my Pi running Raspbian. BTW This isn't a recommended kludge for public consumption, really just a proof-of-concept to help track down the issue.

Building for ARMV7 stops the compiler complaining, but then generates instructions that the Pi CPU doesn't understand, as it is the wrong version.

The right thing to do would be post a bug against tcmalloc to get a proper fix. A possible workaround in the meantime would be to try getting a proper ARMV5 build of gperftools going.
Posts: 70
Joined: Sat Aug 04, 2012 6:02 pm
by hermanhermitage » Sun Oct 28, 2012 10:53 pm
I'm using linux-tools-3.2 on raspbian.

install with:
sudo apt-get linux-tools

Record profile:
perf record -g "path-to-executable"

report profile:
perf report -g

Better than a poke in the eye with a sharp stick -- but not much. I'm going to have to add my own instrumentation I think to suit my needs.
Posts: 65
Joined: Sat Jul 07, 2012 11:21 pm
Location: Zero Page
by paffinity » Mon Nov 05, 2012 5:12 pm
Thanks for the information on using perf record.
I'd also like to use perf stat to query the hardware counters but I keep getting the error:

Error: open_counter returned with 19 (No such device). /bin/dmesg may provide additional information.

Fatal: Not all events could be opened.


Does anyone know how to fix this?

I'm guessing that I'll need to recompile a kernel with support for this enabled.
If there's another way to get around this, please let me know.
Posts: 1
Joined: Mon Nov 05, 2012 5:10 pm
by MAA1612 » Mon Jan 21, 2013 9:59 am
robquant wrote:I use gperftools from the Arch repos and this works nicely:


However, the current tarball did not compile on raspbian, because it used armv7 instructions even when compiling for armv6. Fortunately, a patch has been committed recently which solves this issue:

http://code.google.com/p/gperftools/issues/detail?id=493

Still compiling, but it seems to work now ...
Posts: 18
Joined: Thu Jan 17, 2013 11:00 am