Izmaki
Posts: 3
Joined: Thu Dec 20, 2012 10:42 pm

Baking Pi wait-method - Getting the details straight

Thu Dec 20, 2012 11:32 pm

Hi guys!

I have a question for you.
I'm making my first thread here after fiddling with my R-Pi for a few days following the Baking Pi tutorial on/off. I just finished part 4 where I in the end had to implement a better wait method. The suggested way of doing so is the following:

Code: Select all

.globl Wait
Wait:
	delay .req r2
	mov delay,r0	
	push {lr}
	bl GetTimeStamp
	start .req r3
	mov start,r0
	loop$:
		bl GetTimeStamp
		elapsed .req r1
		sub elapsed,r0,start
		cmp elapsed,delay
		.unreq elapsed
		bls loop$
.unreq delay
.unreq start
pop {pc}
The things to notice here is what happens inside the loop. In this example we have a subtraction each time we repeat the loop. I assume this takes longer time to compute than 1 microsecond, which means this wait method might be off by some amount of microseconds (how much?).

I've thought of a different solution, where we don't have a subtraction in the loop to calculate all the time. This is my solution:

Code: Select all

.globl WaitX
WaitX:
	delay .req r2
	mov delay,r0	
	push {lr}
	bl GetTimeStamp
	start .req r3
	mov start,r0

	add start,delay,start
	loop2$:
		bl GetTimeStamp
		cmp r0,start
		bls loop2$

	.unreq delay
	.unreq start
	pop {pc}
(Here's the entire code if anyone wishes to look this through: http://pastebin.com/4CRg5tTQ )

As you can see, my solution does not have any subtractions in the loop, because it adds the delay to the start value, and then compares the current timer-value to this new "start" value inside the loop. I would argue that this is slightly faster than the other implementation, hence making this solution more reliable than the other.

Is this true? If so, do you know how much faster it is?
Even better question: is there anything else that can be done to make this wait-method more reliable and fast?


Thanks :)

Izmaki
Posts: 3
Joined: Thu Dec 20, 2012 10:42 pm

Re: Baking Pi wait-method - Getting the details straight

Fri Dec 21, 2012 1:57 am

Also, going from Java, where you basically throw around with unused variables in a slow environment, to C# where you basically throw around with unused variables in a slightly faster environment, to C where you need to start thinking about what you do and why, to assembly I started wondering exactly how much optimization an assembly programmer should try to achieve. If you could spend 3 hour making your program run 1 ms (let's assume this would be a few percentages) faster, is it worth it? Of course this answer depends on the purpose of the program. Is it a critical part of a system used on an airplane? Is it a new IM client? But isn't speed one of the key reasons to use assembly, among other obvious reasons (direct hardware interaction, compatibility, etc.)?

But!
Are there any guidelines about this describing to which extend an ARMv6 programmer should optimize? I mean, going from a selection sort algorithm (for example) to a quick sort algorithm is an obvious performance boost, if all you want is to sort something, but the change I made would at most save me a fraction of a split second I assume.

dwelch67
Posts: 954
Joined: Sat May 26, 2012 5:32 pm

Re: Baking Pi wait-method - Getting the details straight

Fri Dec 21, 2012 6:09 am

For starters, not having the subtract can mess up the time if the timer is allowed/expected to roll over. I dont know enough about the get time function to know how and when that happens.

You are not really doing much optimization here, it will be lost in the noise. Look at every instruction in the loop, in particular the implementation of the get time function. Anywhere in that loop plus the code leading up to it create latency, unless you get yourself into a nice beat frequency will have some jitter. Some wait periods will be longer, some will be shorter. How much or how little the jitter is has to do with the frequency of the timer and the frequency of the sampling loop and how they interact with each other.

If you are interested in saving one instruction, then it doesnt make sense to leave a branch link to a function, inline the function, or even better change the way you use the timer (if possible). removing the subtract and inlining the get time function might or might not make a noticeable difference in the overall accuracy of the wait function.

The system clock is 250mhz, the subtract instruction averages 1 clock cycle. So even if the processor were slowed down to 250mhz it would be 4 nanoseconds of savings. The arm is running faster than 250mhz 600mhz or something like that. 1.7 nanoseconds per clock cycle perhaps.

From another angle, yes, if the timer is never going to roll over, you have saved one clock cycle on the way out of the function, making the time measurement one clock cycle more accurate, at the same time depending on the timer frequency and the delay you are looking for, etc. you might be able to adjust for that in the function (instead of waiting for N clock cycles wait for N-1 or N-3 or whatever).

A delay function like this will accumulate error over time (if you make repetitive calls). If you look at my blinker examples and apply a delay like this, then use a watch with a second hand or some other time reference. Over many minutes or tens of minutes, perhaps an hour, you should be able to detect this error. For example have a function like this measure 1 millisecond or maybe a tenth of a second or something like that. Say 1/10th of a second, 10 calls to that function is a second, 600 calls is one minute, 36000 calls is one hour. That is not enough if you were to add say 20 clock cycles of error per call at 2ns per, that would be, 20*2*36000 = 1,440,000 ns which is 1ms, not going to see that with your stopwatch. if you had the delay loop delay 1/1000 of a second (1ms) then 1000 calls is 1 second 60000 calls is one minute 3600000 is an hour. 1/10th of a second of error over that hour IF it were an error of 20 clocks. to shorten this paragraph, the wait function is measuring from somewhere in the first get time call to somewhere in the second get time call, the entry into the function up to that first time call and the exit from the second time call out of the function is always added to the amount of time you were waiting for. This will accumulate if you try to make repetitive calls to this function as a time reference. You could be a few instructions long or a few dozen maybe unmeasurable, mabye not. Add to that the timer frequency you might be adding very noticeable amounts of time at best you are +/- 2 timer clock ticks.

If you were using a function like this to for example bit bang a spi or i2c bus or something like that, and you looked at that bus on a scope you might see both the jitter, the error from one measurement to another, and when making these optimizations, the improvement in accuracy. You might or might not, but a scope can measure better than your eyes and a stopwatch. Would it matter of this time is more accurate or not for those busses? Likely not.

Yes you have to ask the question, why would you use repetitive calls to a function like this to try to take accurate time...You normally wouldnt. I have shown in some of my blinker examples methods that would allow you to make repetitive calls indefinitely and overall be as accurate as the timer/clocks themselves (the system runs off of a crystal that has some accuracy, you cannot get better than that accuracy). you would use a function like this to make a single accurate time measurement here or there, maybe a few in a row, but not indefinite. And to that end the bl is a bigger killer than sub on your accuracy both on the way in and on the way out, but that accuracy is likely not something you will be able to measure or see.

David

User avatar
rurwin
Forum Moderator
Forum Moderator
Posts: 4258
Joined: Mon Jan 09, 2012 3:16 pm
Contact: Website

Re: Baking Pi wait-method - Getting the details straight

Fri Dec 21, 2012 9:41 am

That pastbin doesn't tell me what I wanted to know - why you needed this wait function.

You appear to have a solid regular clock. Why would you throw that away with a wait function that introduced latency? If latency is an issue then you should be using your clock in a different way:

Code: Select all

NextTime = GetTimestamp() + delay

Forever
{
   if NextTime < GetTimeStamp()
   {
      Error - You've missed an entire interval. Recover in the best way you can, maybe the following:
      NextTime = GetTimestamp() - delay
   }

   NextTime = NextTime + delay

   do
   {
      Now = GetTimestamp()
   }
   until Now >= NextTime

   Process stuff as of time=NOW
}
You should also be aware of the behaviour of the comparison that you are using. Time is infinite, therefore it can overflow the bounds of an integer. A 32-bit integer counting micro-seconds wraps around after 71.5 minutes, if measuring milliseconds it wraps after 49 days. (64 bits is safe enough unless you are counting nanoseconds from 1700AD, but it's still better to write correct code.) Consider two cases, for the sake of saving my fingers and your eyes, I'll use 16-bit integers:

1. The current time is 0x7FFF and NextTime is 0x8001
A signed comparison will see NextTime as less than current time

2. The current time is 0xFFFF and NextTime is 0x0001
An unsigned comparison will see NextTime as less than current time

The correct comparison for a value that might overflow, but comparing it to a value which is close to it, is a - b < 0

So comparing 5 < 10 for example, 5 - 10= -5, and -5 < 0, so 5 < 10
10 - 5 = 5, and 5 > 0, so 10 > 5

Case 1: 0x7FFF - 0x8001 = FFFE, which is < 0, so a < b
Case 2: 0xFFFF - 0x0001 = 0xFFFE, which is < 0, so a < b

You should therefore be using BPL (ie >=) to do the branch. The extra subtract in the original code probably has the same effect, allowing an unsigned comparison to be used safely.

Izmaki
Posts: 3
Joined: Thu Dec 20, 2012 10:42 pm

Re: Baking Pi wait-method - Getting the details straight

Sat Dec 22, 2012 8:01 pm

Hi guys :)

Thanks for you replies. This answered all, and more too. I didn't know about the cycles, David, so this was a great help.

Thanks, once again

Return to “Bare metal, Assembly language”