Programming the ARM chip


135 posts   Page 6 of 6   1, 2, 3, 4, 5, 6
by dwelch67 » Sat Jun 09, 2012 1:22 am
DexOS wrote:I agree with you dwelch67 and feel your pain, i would go as far as to say if you want to learn bare metal coding chose another Arm, i have worked on a lot of different processors and this one is really a pain.
Code should work or not work, there should be know gray areas.
I read some where that ATAGs start at offset 0x100, but this was for general linux.

As a side note, you do know about the need to "memory barrier"
Code: Select all
MemoryBarrier:
   mcr   p15, 0, ip, c7, c5, 0      @ invalidate I cache
   mcr   p15, 0, ip, c7, c5, 6      @ invalidate BTB
   mcr   p15, 0, ip, c7, c10, 4      @ drain write buffer
   mcr   p15, 0, ip, c7, c5, 4      @ prefetch flush
   mov   pc, lr

When changing between ARM peripheral see page 7 of the BCM2835-ARM-Peripherals.pdf
Accesses to the same peripheral will always arrive and return in-order. It is only when
switching from one peripheral to another that data can arrive out-of-order. The simplest way
to make sure that data is processed in-order is to place a memory barrier instruction at critical
positions in the code.


Hopefully before this bites me I will remember that you told me about it...I am not doing anything that is moving data from one peripheral to another or timing critical between peripherals. The write buffer shouldnt be in play if the caches are off. I wonder if it is I cache only and not d cache if data still gets hung up in the write buffer.

Even treating it as a microcontroller $25 or $35 I guess buys you a lot, the sdm32f4 discovery is what I would consider a competitor to this and yes has much better documentation, I/O, etc. Supposedly the Raspberry Pi goal is programming education, providing computers to kids who might not otherwise have one. Bare-metal programming is educational as well and I feel a skill being lost so not the best platform but if they are even marginally successful, a small percentage of the small percentage that have one and try to program will try something low level, so I hope to have some starting points.

Definitely true, though, this is not a low level platform for beginners in general, too many secrets, too many surprises, too much hacking required to get anything working. All the things that cause a percentage of the folks that try to fail and never come back (to low level programming).

Will see what happens, if it starts towards arduino like success, before it gets there it will need better docs and less surprises if we figure it out through hacking or from the vendor.
Posts: 423
Joined: Sat May 26, 2012 5:32 pm
by dwelch67 » Sat Jun 09, 2012 2:05 am
based on my experiments today and just now trying the disable_commandline_tags=1 and seeing some discussion https://github.com/raspberrypi/linux/issues/16 I verified that if you do not have a config.txt it loads kernel.img to 0x8000 the config.txt kernel_address override wont put your program at zero. If you have config.txt with disable_commandline_tags=1 then it does put kernel.img at zero (perhaps even if you have a kernel_address=0x8000). good thing I just went through all of my examples to build for 0x8000, going to go back and touch everything again.
Posts: 423
Joined: Sat May 26, 2012 5:32 pm
by dwelch67 » Sat Jun 09, 2012 2:10 am
it is not very reliable though with the commandline_tags disabled. Boots properly one out of every several power cycles. where not having a config.txt or not having it disabled and allowing kernel.img to load at 0x8000 works every time or almost every time. NOT going to go back and change things. Not worried about the tags they appear to be at address 0x100, will deal with adding handlers/vectors programmatically, etc.

It may not have always put kernel.img at 0x8000 as it does now, dont know have not had one that long...
Posts: 423
Joined: Sat May 26, 2012 5:32 pm
by dwelch67 » Sat Jun 09, 2012 2:18 am
disabling the l2 cache

disable_l2cache=1

does make it run slower. Simple benchmark numbers running small loop fetching from ram (bench02 example in my repo, with l1 off and a divisor on the timer just in case it was really slow):

Code: Select all
012D2E6E
012D2E2B
003FB7F5
00309A17
014A434A
0100021F
002787E5
00AB5206


same but with l2 cache disabled
Code: Select all
020D269E
020D23B6
0078085F
005BB19C
02630D54
01E98299
0043F08D
01372259
Posts: 423
Joined: Sat May 26, 2012 5:32 pm
by DexOS » Sat Jun 09, 2012 3:09 am
dwelch67 wrote:disabling the l2 cache

disable_l2cache=1

does make it run slower. Simple benchmark numbers running small loop fetching from ram (bench02 example in my repo, with l1 off and a divisor on the timer just in case it was really slow):

Code: Select all
012D2E6E
012D2E2B
003FB7F5
00309A17
014A434A
0100021F
002787E5
00AB5206


same but with l2 cache disabled
Code: Select all
020D269E
020D23B6
0078085F
005BB19C
02630D54
01E98299
0043F08D
01372259

I agree, disabling the l2 cache does make a difference, theres is a noticeable delay from boot up to LED ok comes on, its come on much faster with it enabled .

But there is another side to this, with it enabled my code to get screen-buffer and write to screen works only every other power on.
With it disabled, it works every time.
Batteries not included, Some assembly required.
User avatar
Posts: 864
Joined: Wed May 16, 2012 6:32 pm
by dwelch67 » Sat Jun 09, 2012 2:16 pm
disabling l2 didnt give me any boot problems, it was the disable ATAG and boot from zero that gave me problems.
Posts: 423
Joined: Sat May 26, 2012 5:32 pm
by DexOS » Sat Jun 09, 2012 3:26 pm
dwelch67 wrote:disabling l2 didnt give me any boot problems, it was the disable ATAG and boot from zero that gave me problems.

Nor me, it was booting with it enabled that gave problems, i could boot OK, but the code to write to screen, only worked every other boot.

With it disable it booted and the screen code worked every time, i thing i know why, its because the address you use is different if it enabled or not.
I think your suppose to read ATAG and from that, use the right address.
Batteries not included, Some assembly required.
User avatar
Posts: 864
Joined: Wed May 16, 2012 6:32 pm
by dwelch67 » Wed Jun 13, 2012 6:45 pm
Anyone else with floating point experience?


If you feed the fpu this:

Code: Select all
0x3FFFFFFF


and do a float to int conversion

Code: Select all
    vmov s0,r0
    vcvt.s32.f32 s2,s0
    vmov r0,s2

you get a 0x00000001 not a 0x00000002 same answer with a float to unsigned int.

Some background:
Code: Select all
0x40000000 = 2.0
0x3F800000 = 1.0
0x3FC00000 = 1.5
0x3FCCCCCD approx 1.6
0x3FFFFFAC approx 1.99999


0x3FFFFFFF is very close to 2, and the default rounding mode is supposed to be round to nearest, chose even if a tie, well 2 is both nearest and even...I have my math right yes?

Same problem with almost minus 2. 0xBFFFFFFF

gives a minus 1 not minus 2

(yes I am intentionally testing the fpu looking for bugs)
Posts: 423
Joined: Sat May 26, 2012 5:32 pm
by tufty » Wed Jun 13, 2012 7:11 pm
dwelch67 wrote:Anyone else with floating point experience?
(yes I am intentionally testing the fpu looking for bugs)

In terms of the numbers, yes, you have it right. 0x3fffffff is infinitesimally close to 2. However, my ARM-ARM says:

A8.6.293 VCVT (between floating-point and integer, Advanced SIMD)
...
The floating-point to integer operation uses the Round towards Zero rounding mode.



Simon
Posts: 1368
Joined: Sun Sep 11, 2011 2:32 pm
by dwelch67 » Thu Jun 14, 2012 3:31 am
Okay here is what one of mine says:

C2.2 Rounding
...
Round to Nearest (RN) mode
...
This is the default rounding mode, and generally yields the most accurate results. The other
rounding modes are mostly used for specialized purposes, such as interval arithmetic.
...
Floating-point to integer
...FTOSID, FTOSIS, FTOUID, or FTOUIS...
...
The special forms FTOSIZD, FTOSIZS, FTOUIZD, and FTOUIZS of these instructions allow the conversion to be done using Round towards Zero (RZ) mode, without changing the rounding mode specified by the FPSCR.
...

So when I compare them head to head:

[code]vcvt.s32.f32 s2,s0
ftosis s2,s0
ftosizs s2,s0
810c: eebd1ac0 vcvt.s32.f32 s2, s0
8110: eebd1a40 vcvtr.s32.f32 s2, s0
8114: eebd1ac0 vcvt.s32.f32 s2, s0
[/code]

so the instruction I was using is the special round to zero thing, nice catch.

changed it to vcvtr.s32.f32, no more problems.
Posts: 423
Joined: Sat May 26, 2012 5:32 pm