Code: Select all
1126 18.3507 cc1 /usr/lib/gcc/arm-linux-gnueabihf/4.6/cc1 1037 16.9003 libbfd-2.22-system.so /usr/lib/libbfd-2.22-system.so 726 11.8318 libc-2.13.so /lib/arm-linux-gnueabihf/libc-2.13.so 395 6.4374 bash /bin/bash 287 4.6773 ld-2.13.so /lib/arm-linux-gnueabihf/ld-2.13.so 209 3.4061 vmlinux copy_page 189 3.0802 vmlinux default_idle 158 2.5750 vmlinux do_page_fault 112 1.8253 libcofi_rpi.so memcpy 93 1.5156 vmlinux __do_fault 85 1.3853 vmlinux cfb_imageblit 72 1.1734 as /usr/bin/as 72 1.1734 libcofi_rpi.so memset 69 1.1245 vmlinux __memzero 60 0.9778 ld.bfd /usr/bin/ld.bfd 59 0.9615 vmlinux filemap_fault 53 0.8638 vmlinux handle_pte_fault 47 0.7660 vmlinux memcpy 44 0.7171 vmlinux get_page_from_freelist 43 0.7008 vmlinux find_get_page 40 0.6519 vmlinux find_vma 35 0.5704 sed /bin/sed 35 0.5704 vmlinux __down_read_trylock
Code: Select all
1148 18.8320 cc1 /usr/lib/gcc/arm-linux-gnueabihf/4.6/cc1 1052 17.2572 libbfd-2.22-system.so /usr/lib/libbfd-2.22-system.so 710 11.6470 libc-2.13.so /lib/arm-linux-gnueabihf/libc-2.13.so 428 7.0210 bash /bin/bash 254 4.1667 ld-2.13.so /lib/arm-linux-gnueabihf/ld-2.13.so 190 3.1168 vmlinux default_idle 171 2.8051 vmlinux do_page_fault 119 1.9521 libcofi_rpi.so memcpy 113 1.8537 vmlinux copy_page 94 1.5420 vmlinux __do_fault 85 1.3944 as /usr/bin/as 79 1.2959 vmlinux cfb_imageblit 77 1.2631 vmlinux __memzero 63 1.0335 ld.bfd /usr/bin/ld.bfd 63 1.0335 libcofi_rpi.so memset 55 0.9022 vmlinux filemap_fault 44 0.7218 vmlinux handle_pte_fault 41 0.6726 sed /bin/sed 40 0.6562 vmlinux find_get_page 38 0.6234 vmlinux find_vma 38 0.6234 vmlinux get_page_from_freelist 37 0.6070 vmlinux handle_mm_fault 37 0.6070 vmlinux memcpy
The copy_page patch is quite straightforward (since copy_page is always page aligned), but I found the current kernel memcpy implementation for ARM a bit tricky. In fact when testing the original kernel memcpy function in userspace, I detected copy errors related to unaligned memcpy. I have made no attempt to fix this yet, or to ascertain whether the bug actually exists in kernel space. The kernel memcpy function does do some tricky fiddling with the program counter when doing unaligned access.dom wrote:I did attempt this a while back, but found running the quake3 timedemo would sometimes segfault with the patch applied.
I had a suspicion it was misaligned access related, but haven't had a chance to investigate.
I'll check out quake3, in any case the two patches are available at the address linked at the bottom of the last message.dom wrote: It may have been something I got wrong in patching it, so I'd be interested if quake3 works reliably for you (or post/PR your patch and I'll try it here).
I've not spotted a problem with quake with your patch on a few runs. No measurable change on framerate.hglm wrote:I'll check out quake3, in any case the two patches are available at the address linked at the bottom of the last message.
I believe Raspbian uses the libcofi optimized memcpy and memset via the ld.so.preload mechanism (/etc/ld.so.preload), which is from https://github.com/simonjhall/copies-and-fills/, and which is different from arm-mem which is used in Pidora. In my testing libcofi performs pretty well (perhaps better than arm-mem). libcofi's memset is also pretty fast (faster than the default glibc).dom wrote: How does your memcpy compare with the ones we use in userland (https://github.com/bavison/arm-mem/)?
How about memset?
It sure looks interesting looking at benchmark results.hglm wrote:But I have also been experimenting with a wholly different set of memcpy functions, in my fastarm repository (https://github.com/hglm/fastarm/) which are work-in-progress, and while significantly faster than glibc on other ARM platforms, on Raspbian libcofi does pretty well. I have yet to do extensive real-world benchmarks using oprofile, which can give different results from synthetic benchmarks that repeatedly call memcpy. There are some complex trade offs involved, and historically there have been cases of "optimized" memcpy implementation on different Linux platforms that while showing good performance in synthetic benchmarks, actually slowed down the system in real-world usage.
Users browsing this forum: No registered users and 1 guest