User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Just got my first RPi2.

Tue Feb 02, 2016 5:14 am

I have had a couple of RPi model B's in the past, the 700MHz single core ARM11 with 512MB ram version, these are long gone, lost to moving.

Today though I received my first RPi2, and I am very impressed.

I unboxed everything (came in a kit), left the power supply in the box (it is European, and I am in the USA), overwrote the included NOOBS card with RISC OS Pi RC14, and booted the RPi. Once I edited the Config text to override the fact that my monitor scales resolutions a lot larger than its native 1024x768, and set the mode and wimpmode in RISC OS to sane values (mode 28 and wimpmode X1024 Y768 C16M F75), I was almost immediately impressed. I have been aware that the multiple instruction dispatch on the ARMv7 is a lot better than with the ARMv6, though to see the difference directly, and the effect of better branching logic to reduce pipeline bubbles, making things so much snappier than jut the clock speed increase suggests.

So I ran a few of my weird and simple benchmarks, to see how much better it is doing. Suffice it to say the numbers tell me that the RPi2 provides approximately 3 times the processing speed with ordinary instruction sequences, approximately 7 times the memory bandwidth (I would guess do to a larger cache with higher associativity, or something else?)

So my hat is off to Broadcom for the new IC, ARM for the CPU Core, and the Raspberry Pi foundation for the implementation of the SBC. Congratulations on exceeding what I would have expected to see in only the second generation of Raspberry Pi Hardware.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
jahboater
Posts: 6310
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Just got my first RPi2.

Tue Feb 02, 2016 9:36 am

Its more than that:-
It has NEON which is a very fast SIMD instruction set (NEON does integer arithmetic too - including 64-bit).
It has full 16-bit immediate operands for MOV and the new MOVT (two insns for a 32-bit register load)
It has ARM hardware integer division instructions (SDIV and UDIV)
It has thumb2 (a complete mixed 16/32 bit instruction set - gives approx 25% reduction in code size whilst maintaining similar speed).
Other handy new instructions such as cbz etc.
Pi4 8GB running PIOS64 Lite

User avatar
GTR2Fan
Posts: 1601
Joined: Sun Feb 23, 2014 9:20 pm
Location: South East UK

Re: Just got my first RPi2.

Tue Feb 02, 2016 10:08 am

Lost of interesting Pi2 benchmarks here...

http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

They do gallop along very nicely compared to their older brother, especially if you're not averse to a little comfortable overclocking. :)
Pi2B Mini-PC/Media Centre: ARM=1GHz (+3), Core=500MHz, v3d=500MHz, h264=333MHz, RAM=DDR2-1200 (+6/+4/+4+schmoo). Sandisk Ultra HC-I 32GB microSD card on '50=100' OCed slot (42MB/s read) running Raspbian/KODI16, Seagate 3.5" 1.5TB HDD mass storage.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Just got my first RPi2.

Wed Feb 03, 2016 6:22 am

jahboater wrote:Its more than that:-
It has NEON which is a very fast SIMD instruction set (NEON does integer arithmetic too - including 64-bit).
It has full 16-bit immediate operands for MOV and the new MOVT (two insns for a 32-bit register load)
It has ARM hardware integer division instructions (SDIV and UDIV)
It has thumb2 (a complete mixed 16/32 bit instruction set - gives approx 25% reduction in code size whilst maintaining similar speed).
Other handy new instructions such as cbz etc.
NONE of my benchmarks use any of those features. RISC OS does not use them either.

So none of those effect the results I have. I use pure ARM, not NEON, VFP, thumb, etc.

And some of those are extensions, not ARM (NEON, thumb/thumb2).

Though thank you for mentioning them for those that would use them.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Just got my first RPi2.

Wed Feb 03, 2016 6:24 am

GTR2Fan wrote:Lost of interesting Pi2 benchmarks here...

http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

They do gallop along very nicely compared to their older brother, especially if you're not averse to a little comfortable overclocking. :)
Are those available for a NON-linux system? At least RISC OS?
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

ejolson
Posts: 6043
Joined: Tue Mar 18, 2014 11:47 am

Re: Just got my first RPi2.

Wed Feb 03, 2016 7:13 am

DavidS wrote:
GTR2Fan wrote:Lost of interesting Pi2 benchmarks here...

http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

They do gallop along very nicely compared to their older brother, especially if you're not averse to a little comfortable overclocking. :)
Are those available for a NON-linux system? At least RISC OS?
Most of the benchmarks on that website are written in portable Fortran or C with source code included. As RISC OS supports both these programming languages and more, it would appear a simple matter of recompiling each benchmark and running it. It might also be interesting to hand code some of the benchmarks in ARM assembler to see if faster performance is possible. Are your benchmarks available anywhere?

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Just got my first RPi2.

Wed Feb 03, 2016 7:41 am

ejolson wrote:
DavidS wrote:
GTR2Fan wrote:Lost of interesting Pi2 benchmarks here...

http://www.roylongbottom.org.uk/Raspber ... hmarks.htm

They do gallop along very nicely compared to their older brother, especially if you're not averse to a little comfortable overclocking. :)
Are those available for a NON-linux system? At least RISC OS?
Most of the benchmarks on that website are written in portable Fortran or C with source code included. As RISC OS supports both these programming languages and more, it would appear a simple matter of recompiling each benchmark and running it. It might also be interesting to hand code some of the benchmarks in ARM assembler to see if faster performance is possible. Are your benchmarks available anywhere?
Ok I will have to take a better look at those benchmarks.

As to my benchmarks: They are not yet available, I wrote them just out of curiosity of the performance of the RPi model B (back when it was new), and how it compares to a A7000. I did not think anyone would be interested in them, as they are far from perfect, just algorithms I came up with to exercise the ARM, memory, video, mass storage, cache, and a few other things, and time the results to give a performance index.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
jahboater
Posts: 6310
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Just got my first RPi2.

Wed Feb 03, 2016 8:52 am

And some of those are extensions, not ARM (NEON, thumb/thumb2).
I don't think thumb2 is an extension, some v7 processors (the Cortex-M series) support thumb2 only.
If you are using UAL which people should be now, it doesn't matter anyway.
RISC OS does not use them either.
That's sad, there is a reason why most Linux have v7 as a baseline. Raspbian supplies two kernels.
Pi4 8GB running PIOS64 Lite

kschanaman
Posts: 2
Joined: Mon Jan 11, 2016 5:31 am
Location: Scottsbluff, NE USA

Re: Just got my first RPi2.

Wed Feb 03, 2016 11:29 pm

Oh WOW! SDIV?? The assembly book I am using states the BCM doesn't have a division mnemonic. Guess the book is more outdated (ARMv6) than I realized. And SIMD!!! I feel like a kid in a candy store now :D
jahboater wrote:Its more than that:-
It has NEON which is a very fast SIMD instruction set (NEON does integer arithmetic too - including 64-bit).
It has full 16-bit immediate operands for MOV and the new MOVT (two insns for a 32-bit register load)
It has ARM hardware integer division instructions (SDIV and UDIV)
It has thumb2 (a complete mixed 16/32 bit instruction set - gives approx 25% reduction in code size whilst maintaining similar speed).
Other handy new instructions such as cbz etc.

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Just got my first RPi2.

Thu Feb 04, 2016 4:44 am

jahboater wrote:
And some of those are extensions, not ARM (NEON, thumb/thumb2).
I don't think thumb2 is an extension, some v7 processors (the Cortex-M series) support thumb2 only.
If you are using UAL which people should be now, it doesn't matter anyway.
Of course you are correct that Thumb is a subset of ARM, and Thumb2 a slight extension of Thumb. Though Thumb uses a 16 bit instruction word, and eliminates many of the advantages of the ARM ISA, such as almost all ARM instructions are conditional, not so with Thumb, as wall as a few other things Thumb just can not do that make the ARM so good.
RISC OS does not use them either.
That's sad, there is a reason why most Linux have v7 as a baseline. Raspbian supplies two kernels.
Just because the OS does not use it, does not limit the programs. There would be no advantage of using any of these extensions in the OS for RISC OS, though some programs DO use VFP, and a few even NEON, where there is an advantage in so doing (and limiting the program to ARMv7 is not an issue).

Everything the OS does (including the WIMP, etc), is simple integer, that would not benefit from SIMD. The exceptions being few and far between.

Applications that can take advantage of the extensions provided by the ARMv6 and ARMv7, and are new enough, often do take advantage of these features. The !Charm programming language compiler can compile to use VFP, the BBC BASIC V Assembler now supports VFP (and I think NEON as well), the newer releases of GCC for RISC OS can put these to use, etc.

So why do you think that RISC OS could advantage from using these extensions in the OS? Other than the shared C library and possibly BBC BASIC V, I can not think of anything, that is part of RISC OS, that would benefit from these extensions. If you could provide some example of where it could benefit I would be interested.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
jahboater
Posts: 6310
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Just got my first RPi2.

Thu Feb 04, 2016 12:16 pm

Of course you are correct that Thumb is a subset of ARM, and Thumb2 a slight extension of Thumb. Though Thumb uses a 16 bit instruction word, and eliminates many of the advantages of the ARM ISA, such as almost all ARM instructions are conditional, not so with Thumb, as wall as a few other things Thumb just can not do that make the ARM so good.
While all of that is true for thumb1, it is not the case with thumb2. I believe thumb2 can do anything (and more) that the ARM instruction set can do, including conditional instructions. There are some handy instructions that are thumb2 only, making thumb2 a superset of ARM. Thumb2 is mixed 16 and 32 bit instructions. If you write with UAL assembler syntax, the entire program can be assembled either as thumb2 or ARM as desired, with no changes to the code. There is no need for any thumb interwork stuff as the entire program is in thumb2. Thumb2 is not a "slight extension" to thumb1, it is a step change, a complete instruction set - and I think it is the future.

Its true that the mixed 16 and 32 bit instruction lengths go against RISC principles, where every instruction is the same size.

The cpu still reads 32 bits at a time from the instruction stream, but two 16 bit insns are loaded each time. More insns fit in the I cache. For some code therefore it can actually run faster than 32-bit arm, though overall they quote 98%.
So why do you think that RISC OS could advantage from using these extensions in the OS? Other than the shared C library and possibly BBC BASIC V, I can not think of anything, that is part of RISC OS, that would benefit from these extensions. If you could provide some example of where it could benefit I would be interested.
The answer is simply speed. Of course I agree you can code anything with the old v6 instruction set. but v7 is so much faster.
And as an assembler programmer, for example, why would you bother with a library call to do a division when there is a single instruction now, much smaller and faster.
Last edited by jahboater on Sat Feb 06, 2016 5:19 pm, edited 1 time in total.
Pi4 8GB running PIOS64 Lite

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Just got my first RPi2.

Fri Feb 05, 2016 6:01 am

jahboater wrote:
So why do you think that RISC OS could advantage from using these extensions in the OS? Other than the shared C library and possibly BBC BASIC V, I can not think of anything, that is part of RISC OS, that would benefit from these extensions. If you could provide some example of where it could benefit I would be interested.
The answer is simply speed. Of course I agree you can code anything with the old v6 instruction set. but v7 is so much faster.
And as an assembler programmer, for example, why would you bother with a library call to do a division when there is a single instruction now, much smaller and faster.
Could you give me an example of where the ARMv7 instruction set would benefit the core of RISC OS? All of the core stuff is pretty much integer only, and very little that does a repeated operation on multiple data elements. So for the OS itself I do not see the advantage, with very few exceptions.

It would be nice to have accelerated graphics on the Raspberry Pi, though this has little to do with the CPU, rather coming down to someone that knows enough about the VideoCore to write the modules,and patches to provide accelerated graphics. And thus far I am unaware of any one that has managed to even get usable graphics out with out using any of the Broadcom code.

I can see many advantages for applications of some kinds, and for those using the ARMv7 instruction set makes since, so long as your product does not care about eliminating a large part of RISC OS users as potential users. And there is nothing stopping us from writing software that takes advantage of the ARMv7.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

User avatar
jahboater
Posts: 6310
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Just got my first RPi2.

Fri Feb 05, 2016 5:06 pm

Could you give me an example of where the ARMv7 instruction set would benefit the core of RISC OS? All of the core stuff is pretty much integer only, and very little that does a repeated operation on multiple data elements. So for the OS itself I do not see the advantage, with very few exceptions.
You are probably right, especially if the OS is written in assembler? If it is written in assembler then it would have to be some very specific optimizations - for example using NEON to clear or copy memory. If its written in C then just putting "-mcpu=cortex-a7" in the makefile gets all the v7 stuff.

Here are the sizes in bytes of an integer app with the different instruction sets:-

88756 v6
87344 v7
65672 v7 Thumb2
65640 v7 Thumb2 NEON

As you can see Thumb2 gives about a 25% reduction in size (with no other effects).
The neon version actually had 497 neon instructions. Some of these were simply for clearing structures, others for odd things like a 64-bit integer "shift" (which is really fiddly with 32-bit code but neon can do it in one instruction: vshl or vshr).
Neon registers are 128 bits wide and seem to have very fast access to memory.
If the data is a multiple of 16 bytes, then something like this is very fast indeed:-
1: vldm r1!,{q0-q3}; vstm r0!,{q0-q3}; subs r2,r2,#64; bne 1b
This copies 64 bytes at time and is much faster than ldmia/stmia. You can also rapidly clear memory with veor q0,q0 and then vstm in a loop (OS's often need to zero large chunks of memory).

Of course other odd new instructions can help, but its not worth the work if a compiler isn't doing it for you.
For example to load an immediate constant into a register:-
movw r1,#12345 would be a pain without the new 16 bit immediates (movw clears the top 16 bits of the register and loads the bottom 16 bits, movt r1,#54321 loads the top 16 bits without touching the bottom bits (mov top).

I don't know anything about RISC OS, so cant really comment. But I think you are right, if you cant just recompile it, then its likely not worth the bother.
Pi4 8GB running PIOS64 Lite

User avatar
DavidS
Posts: 4334
Joined: Thu Dec 15, 2011 6:39 am
Location: USA
Contact: Website

Re: Just got my first RPi2.

Sat Feb 06, 2016 5:14 am

jahboater wrote:
Could you give me an example of where the ARMv7 instruction set would benefit the core of RISC OS? All of the core stuff is pretty much integer only, and very little that does a repeated operation on multiple data elements. So for the OS itself I do not see the advantage, with very few exceptions.
You are probably right, especially if the OS is written in assembler? If it is written in assembler then it would have to be some very specific optimizations - for example using NEON to clear or copy memory. If its written in C then just putting "-mcpu=cortex-a7" in the makefile gets all the v7 stuff.

Here are the sizes in bytes of an integer app with the different instruction sets:-

88756 v6
87344 v7
65672 v7 Thumb2
65640 v7 Thumb2 NEON

As you can see Thumb2 gives about a 25% reduction in size (with no other effects).
The neon version actually had 497 neon instructions. Some of these were simply for clearing structures, others for odd things like a 64-bit integer "shift" (which is really fiddly with 32-bit code but neon can do it in one instruction: vshl or vshr).
Neon registers are 128 bits wide and seem to have very fast access to memory.
If the data is a multiple of 16 bytes, then something like this is very fast indeed:-
1: vldm r1!,{q0-q3}; vstm r0!,{q0-q3}; subs r2,r2,#64; bne 1b
This copies 64 bytes at time and is much faster than ldmia/stmia. You can also rapidly clear memory with veor q0,q0 and then vstm in a loop (OS's often need to zero large chunks of memory).

Of course other odd new instructions can help, but its not worth the work if a compiler isn't doing it for you.
For example to load an immediate constant into a register:-
movw r1,#12345 would be a pain without the new 16 bit immediates (movw clears the top 16 bits of the register and loads the bottom 16 bits, movt r1,#54321 loads the top 16 bits without touching the bottom bits (mov top).

I don't know anything about RISC OS, so cant really comment. But I think you are right, if you cant just recompile it, then its likely not worth the bother.
Ok, I can see the use in setting blocks of RAM, copying blocks of RAM (where using a DMA is not appropriate), and the like within the core of RISC OS.

Large portions of RISC OS are written in Assembly Language, and the current versions must be able to be built for systems with ARM CPU's ranging from the ARMv3 (eg the ARM610, StrongARM, ARM710, ARM7500, etc) to the current ARMv7 CPU's.

Though conditional assembly could easily be used to allow for taking advantage of these newer instructions, where it makes sense to do so.

RISC OS is the original widely distributed OS for ARM based systems, being released at the same time as the very first ARMv2 based Computers by Acorn (ARM originally meant Acorn RISC Machine, name changed to Advanced RISC Machine only when ARM was spun off from Acorn), the first version of RISC OS was called Arthur OS. And RISC is still maintained and kept up to date to this day.
RPi = The best ARM based RISC OS computer around
More than 95% of posts made from RISC OS on RPi 1B/1B+ computers. Most of the rest from RISC OS on RPi 2B/3B/3B+ computers

Return to “General discussion”