benjamin.balet
Posts: 4
Joined: Mon Aug 08, 2011 2:41 pm
Contact: Website

Re: Assembly Optimization

Mon Aug 22, 2011 5:02 pm

Hi,

I used to practice Assembly optimization (Some call it S&M) on x86 and 68xxx platforms. I'd like to learn new tips and tricks on ARM.

For those interrested, we could build/use a web site :
- Pointing to existing optimized OSS projects.
- Containing articles about Assembly tricks (such as the famous "XOR a, a" on x86), optimized C/C++ source codes (fast math, approx, avoiding to waste CPU exec pipeline)...
- Listing priority tasks in porting/optimizing projects.
- And may be organize code contests.

I know about GCC Ox flags but assembly code is still 10 to 20 times faster for some tasks (see libjpeg-turbo).

User avatar
Emanuele
Posts: 182
Joined: Wed Aug 03, 2011 5:28 pm
Contact: Website

Re: Assembly Optimization

Mon Aug 22, 2011 6:24 pm

I think it's a good idea and I'll watch closely... but I won't write any assembly code myself :)

Personally, I think that the biggest opportunities (and problems) will be at a higher level. When a program assumes that it has 1MB of L2 cache available (and the best coding practices nowadays often assume that), there isn't really much you can do to make it faster without changing some data structures and algorithms.

I suggest not to focus only on how to port existing code but also on how to write new code with the Raspi architecture in mind.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26442
Joined: Sat Jul 30, 2011 7:41 pm

Re: Assembly Optimization

Tue Aug 23, 2011 8:30 am

Quote from benjamin.balet on August 22, 2011, 18:02
Hi,

I used to practice Assembly optimization (Some call it S&M) on x86 and 68xxx platforms. I'd like to learn new tips and tricks on ARM.

For those interrested, we could build/use a web site :
- Pointing to existing optimized OSS projects.
- Containing articles about Assembly tricks (such as the famous "XOR a, a" on x86), optimized C/C++ source codes (fast math, approx, avoiding to waste CPU exec pipeline)...
- Listing priority tasks in porting/optimizing projects.
- And may be organize code contests.

I know about GCC Ox flags but assembly code is still 10 to 20 times faster for some tasks (see libjpeg-turbo).

There are lots of Arm websites out there that would give lots of Arm assembler tips. The Raspi uses an Arm 11 core (which I believe is Armv6 instruction set), there is nothing 'fancy' on the Arm side. (and the GPU side is already optimised with assembler where necessary).
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26442
Joined: Sat Jul 30, 2011 7:41 pm

Re: Assembly Optimization

Tue Aug 23, 2011 8:34 am

Quote from Emanuele on August 22, 2011, 19:24
I think it's a good idea and I'll watch closely... but I won't write any assembly code myself :)

Personally, I think that the biggest opportunities (and problems) will be at a higher level. When a program assumes that it has 1MB of L2 cache available (and the best coding practices nowadays often assume that), there isn't really much you can do to make it faster without changing some data structures and algorithms.

I suggest not to focus only on how to port existing code but also on how to write new code with the Raspi architecture in mind.


Assuming 1MB of L2 cash isn't 'best' coding practice!

I do now have information on the cache used on the SoC, but it's currently confidential - not sure when it will be released.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

benjamin.balet
Posts: 4
Joined: Mon Aug 08, 2011 2:41 pm
Contact: Website

Re: Assembly Optimization

Tue Aug 23, 2011 11:17 am

Quote from jamesh on August 23, 2011, 09:30
There are lots of Arm websites out there that would give lots of Arm assembler tips. The Raspi uses an Arm 11 core (which I believe is Armv6 instruction set), there is nothing 'fancy' on the Arm side. (and the GPU side is already optimised with assembler where necessary).

So it's urgent to do nothing at all. Thank you for your discouraging reply.

jamesh
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 26442
Joined: Sat Jul 30, 2011 7:41 pm

Re: Assembly Optimization

Tue Aug 23, 2011 12:17 pm

Quote from benjamin.balet on August 23, 2011, 12:17
Quote from jamesh on August 23, 2011, 09:30
There are lots of Arm websites out there that would give lots of Arm assembler tips. The Raspi uses an Arm 11 core (which I believe is Armv6 instruction set), there is nothing 'fancy' on the Arm side. (and the GPU side is already optimised with assembler where necessary).

So it's urgent to do nothing at all. Thank you for your discouraging reply.

Sorry, not meant to be discouraging. Just pointing out that there are already many Arm websites out there that would give you a good starting point for learning about Arm optimising (and Arm assembler itself which is quite different from x86). There's no point in reinventing the wheel and repeating what others have already done. Also pointing out that the RasPi uses a completely standard Arm processor, there's nothing different about it, so there isn't any need for Raspi specific optimisation - if they work on Arm they work on Raspi. That means you can go out now and write Arm code and run it in qemu or similar and it should work on the device when it comes out. And you are more than welcome to do so!

The problem with assembler is that you are of course limited to the processor its written for. Most Linux source doesn't use it so it stays cross platform compatible. Obviously there are areas in the kernel that need assembler and they need to be rewritten for each platform, but even there it's infrequently used. Also, modern compilers are generally so good that they make the move to assembler not cost effective (although Arm gcc has taken a while to get there). As you point out there are areas that can benefit, often in the multimedia area where a lot of repetitive processing is done and where saving a few cycles per pixel can really add up, but it takes a really good assembler coder to extract those cycles so for the man on the street the compiler is good enough.

With regard to the GPU, that's closed source, and already very heavily optimised in its particular assembler code (it's a propriety core, not Arm), which is really quite difficult!

All that said, I'm not trying to discourage you from writing in assembler, or writing up optimisations, just pointing out some advantages/disadvantages.
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

patrickhwood
Posts: 27
Joined: Wed Aug 31, 2011 2:12 am

Re: Assembly Optimization

Thu Sep 01, 2011 1:01 am

The problem with the GPU being closed source is that there are many problems that are not display related that are handled by GPU hardware better than a general-purpose CPU like the ARM. You only have to look at CUDA and OpenCL for some examples of this. Similarly, lots of "media accelerators" nowadays also have some form of SIMD support (a la MMX or SSE). If their data formats are flexible (i.e., more than just byte vectors), they can be really useful for things other than DCT decoding and color conversion.

Does the ARM on the r-pi support the DSP/SIMD extensions?

bb-tronics
Posts: 4
Joined: Thu Jun 14, 2012 5:06 pm

Re: Assembly Optimization

Thu Jun 14, 2012 6:19 pm

Hi all,

I have just done a bit of searching on ARMv6 assembly

could someone point me to a website with instruction sets and examples for the Raspberry Pi or otherwise

Regards,
Damian (Blackburn, Lancashire)

User avatar
johnbeetem
Posts: 945
Joined: Mon Oct 17, 2011 11:18 pm
Location: The Mountains
Contact: Website

Re: Assembly Optimization

Thu Jun 14, 2012 6:42 pm

bb-tronics wrote:I have just done a bit of searching on ARMv6 assembly

could someone point me to a website with instruction sets and examples for the Raspberry Pi or otherwise
There are good links in this RasPi forum topic, "ARM Assembler": http://www.raspberrypi.org/phpBB3/viewt ... f=2&t=2910

Return to “Other projects”