zedrummer
Posts: 38
Joined: Sun Jan 07, 2018 5:15 pm

Timer with nanosecond accuracy for electronic purpose

Wed Jan 10, 2018 10:50 am

Hello (again)

For training on my "coding-with-electronic" new hobby, I have studied the datasheet of the Texas Instrument TLC5940. I think I have quite an accurate overview of what is needed to power some LEDs (for a RGB LED cube for example) and of what the code loop would look like.

The problem is that you have to send data to the chip bit by bit and between each bit, you set another pin to high, wait for 16ns then set this pin to low then wait again for 16ns.

I am aware of the free running counter at 1MHz, but a pulse every 1µs is a huge loss of time if you have, let's say 5 TLC5940 in a chain that use 16 outputs of 12 bits each, i.e. 5*16*12=960 bits to fill. In the perfect case, it takes 960*32ns=30720ns so 30.72µs to compare with using the free running counter (it would be 960µs).

Is there a way to get another timer that would give a better time resolution?

Perhaps sending 1 bit would take more time than 32ns anyway and so there is no need to check a timer, I don't know about that.

Thanks a lot for your help

Cathy L.

scotty101
Posts: 3184
Joined: Fri Jun 08, 2012 6:03 pm

Re: Timer with nanosecond accuracy for electronic purpose

Wed Jan 10, 2018 12:27 pm

Given the design specs of this device I'd be tempted to try connecting it up via the raspberry pi's SPI interface and using that to deal with clocking out the data.
Using the pigpio library would also be another approach given it's high speed access of GPIO pins.
Electronic and Computer Engineer
Pi Interests: Home Automation, IOT, Python and Tkinter

bzt
Posts: 175
Joined: Sat Oct 14, 2017 9:57 pm

Re: Timer with nanosecond accuracy for electronic purpose

Wed Jan 10, 2018 12:27 pm

Try to use the ARM built-in counter in a busy loop. It can be accessed by system registers. Here's a code that should be trivial to modify for your needs:
https://github.com/bztsrc/raspi3-tutori ... lays.c#L39
Not sure how precise it is though, for msec the error margin is definitely insignificant, so I haven't measured it on nanosec scale.

Bests,
bzt

zedrummer
Posts: 38
Joined: Sun Jan 07, 2018 5:15 pm

Re: Timer with nanosecond accuracy for electronic purpose

Wed Jan 10, 2018 2:30 pm

Thank you Scotty for the advice, but I'd like to remain in the Bare-Metal side, not sure the libs are useful so, are they?

I will try your solution bzt, I have several questions on it.
- For your wait_msec: I was not aware of coprocessor registers like CNTFRQ_EL0 and CNTPCT_EL0, I have read http://infocenter.arm.com/help/index.js ... DFGH.htmll that seems quite ambiguous about the frequency. Is CNTFRQ_EL0 in MHz ("typically in MHz") ?
- For your wait_cycles: Is nop always lasting the same time? I mean, if 10 millions of them last x seconds, then 1 of them lasts accurately x/10000000 seconds?

Really thanks to both of you for your time
Cathy L.
Last edited by zedrummer on Wed Jan 10, 2018 2:48 pm, edited 1 time in total.

zedrummer
Posts: 38
Joined: Sun Jan 07, 2018 5:15 pm

Re: Timer with nanosecond accuracy for electronic purpose

Wed Jan 10, 2018 2:46 pm

Sorry and there http://infocenter.arm.com/help/index.js ... fcggi.html, they say that "NOP is not necessarily a time-consuming NOP. The processor might remove it from the pipeline before it reaches the execution stage."

dwelch67
Posts: 944
Joined: Sat May 26, 2012 5:32 pm

Re: Timer with nanosecond accuracy for electronic purpose

Wed Jan 10, 2018 3:58 pm

I dont think the timer is your problem, there are a couple there already, it is the herky jerky nature of a pipeline, assuming you can keep the code in L1 then great, but it it bounces out to L2/DRAM then big performance hit, all accuracy is lost.

Microcontrollers have a much better chance, similar problems but the slow memory (flash) is somewhat of a known quantity you just get into fetch times and alignments within the fetch. these dram platforms are just a mess, you kinda want to have an mcu or cpld doing the more accurate work and the bigger dram machine managing that as needed. But see what you see on this platform.

nops in arm are usually just normal instructions I dont think they have a dedicated nop like other platforms it is either and r0,r0 or xor r0,r0 or some such encoding so the processor would have to catch that as a dead instruction (without the s bit it doesnt change flags so it really doesnt do anything real). it is not just nops that are the problem a deeper pipe like you see in a core like this can be herky jerky, we can read the news and try to decide if there is any speculative execution which would affect us and the pipe in that it does extra fetching. even the arm11 mpcore (the one that came after the arm11 in the pi-zero) has speculative execution in that it looks ahead in the pipe sees a branch and starts to fetch early (I found a gcc bug that was feeding data that looked like branches into the pipe add or remove code and the data pattern which was an address would change and our app would go off the rails or not based on these fetches, longer story) so for performance reasons the fetches can happen in parallel in a sense at the axi bus but eventually, likely, serialize in the l2 cache or dram controller. again lots of nanoseconds gained or lost by the nature of alignments, instruction sequences, what is in the pipe and how it tickles the processor. Some of these prefetching things can be turned off though, have to look at the specific core.

as posted in this forum recently if you use the mmu it can cause non-deterministic execution times as every single access has to go through the mmu and if you get a tlb miss then you are off to slower ram.

zedrummer
Posts: 38
Joined: Sun Jan 07, 2018 5:15 pm

Re: Timer with nanosecond accuracy for electronic purpose

Wed Jan 10, 2018 4:29 pm

Really interesting and extensive answer.
In my case, it is just a loop with:

another_cycle:
nop
subs r0,r0,1
beq another_cycle

Could it really change from one time to another ?

LdB
Posts: 866
Joined: Wed Dec 07, 2016 2:29 pm

Re: Timer with nanosecond accuracy for electronic purpose

Thu Jan 11, 2018 4:21 am

You are trying to time something you don't need to time.

On a Pi the GPIO output have a bus constrained speed .. its a slow peripheral AXI bus

Pi1: 20Mhz (50ns)
Pi2: 41.7Mhz ( 23.9ns)
Pi3: 65.8 Mhz (15.19ns)

If you like think of it as access to the GPIO pin has an enormous wait state inserted into it. On a Pi3 just write a low then the high it will guarantee you 30ns, On the other models you can just write the high it can't get inside your speed requirements as they can never reach 16ns write speed.

I also warn you that the GPIO pins don't have enormous drive at those frequencies you will need to take proper care to get them off the board to another board.

zedrummer
Posts: 38
Joined: Sun Jan 07, 2018 5:15 pm

Re: Timer with nanosecond accuracy for electronic purpose

Thu Jan 11, 2018 6:25 am

Great! Thank you LdB. That's the kind of knowledge you can not have access to as a simple hobby-ist (all the more as a rookie).

Sorry your sentence "I also warn you that the GPIO pins don't have enormous drive at those frequencies you will need to take proper care to get them off the board to another board." makes no sense for me, what is the drive you are talking about and how to take care?

Thanks again, you really saved a lot of my time wasting unnecessary CPU cycles.

Cathy L.

bzt
Posts: 175
Joined: Sat Oct 14, 2017 9:57 pm

Re: Timer with nanosecond accuracy for electronic purpose

Thu Jan 11, 2018 11:15 am

Hi Cathy,
zedrummer wrote:
Wed Jan 10, 2018 2:30 pm
Thank you Scotty for the advice, but I'd like to remain in the Bare-Metal side, not sure the libs are useful so, are they?

I will try your solution bzt, I have several questions on it.
- For your wait_msec: I was not aware of coprocessor registers like CNTFRQ_EL0 and CNTPCT_EL0, I have read http://infocenter.arm.com/help/index.js ... DFGH.htmll that seems quite ambiguous about the frequency. Is CNTFRQ_EL0 in MHz ("typically in MHz") ?
I've found those registers in this documentation: https://developer.arm.com/docs/ddi0487/latest So far this is the best description on the net, but like any other arm doc, it's not 100% perfect. Most notably the summary tables are often wrong. Still I would recommend to use that.
zedrummer wrote:
Wed Jan 10, 2018 2:30 pm
- For your wait_cycles: Is nop always lasting the same time? I mean, if 10 millions of them last x seconds, then 1 of them lasts accurately x/10000000 seconds?
Yes as long you use one implementation in your code. If you implement it in two places, then there could be differences, see dwelch67's detailed answer (first implemetation in L1, but second only in L2 for example, or the first on an instruction cacheable page and the 2nd on a non-cacheable page etc.).

According to the doc, a nop should always cost you 1 CPU cycle, but I haven't dig into pipelines on ARM deeply. I assume (and I have to stress IMHO) it is indifferent if you always use the same loop. That means you always use the same address, meaning always use the same cacheline. And doesn't matter if subsequent jumps in the loop clear the pipeline or not, because that will either always clear it or never, leading to a consistent cycle consumption for one loop iteration. It is important that I needed minimum guaranteed delays with small precision (more than 100 nops at minimum), so I did not care much. If you also want to guarantee the maximum execution time of your delay, you should know the cache and pipeline characteristics for sure.

Now as I've said I'm not familiar with pipeline and cache optimization techniques on ARM (only on other architectures), but I'll give you some pointers if you're into it:
  • make sure your loop is properly aligned and small enough to be kept in L1 all the time
  • how does speculative execution influence function call (bl) and pipelines? If it's possible that your loop in the function will be prefetched before the call executed under some circumstances, what's the time difference for the first iteration (if it's already in the pipeline and if it's not)? Is it significant at all (bigger than your accuracy requirement)? Is it important at all? (I mean if you never use smaller iteration counts than 10, that would make the difference only one tenth important. But if you want to use one iteration delays, then the difference will count in 100%)
  • how to implement a loop without clearing the pipeline (maybe if the number of instructions in the loop smaller or equal to the number of levels in the pipeline is enough? Or needs special instructions?) Or maybe you should use non-cacheable page and force to clear the pipeline on every iteration (see barriers, like isb)?
The point is, make sure of it and then you can count cpu cycles precisely, so that your assumption that n iterations will require n times the time of one iteration will hold and became a fact.
zedrummer wrote:
Wed Jan 10, 2018 2:30 pm
Really thanks to both of you for your time
Cathy L.
You're welcome! Sorry I couldn't provide more details, only theory.

Cheers,
bzt

LdB
Posts: 866
Joined: Wed Dec 07, 2016 2:29 pm

Re: Timer with nanosecond accuracy for electronic purpose

Fri Jan 12, 2018 1:57 am

zedrummer wrote:
Thu Jan 11, 2018 6:25 am
Sorry your sentence "I also warn you that the GPIO pins don't have enormous drive at those frequencies you will need to take proper care to get them off the board to another board." makes no sense for me, what is the drive you are talking about and how to take care?
You need to understand when switching PINS very fast they have capacitance and the pins have limited current drive typically in mA. It comes under the description Parasitic capacitance
https://en.wikipedia.org/wiki/Parasitic_capacitance
At low frequencies parasitic capacitance can usually be ignored, but in high frequency circuits it can be a major problem
Initially the waveforms will start to round off edges, eventually you will end up with a thing that looks like a triangle wave and it wont reach a level it will trigger your connected circuit in the TLC5940.

So you can't just drag the signals off the IO connector in single loose wires like you would for low frequencies you need to keep it in things like ribbon cables and keep the cable length as short as practical. If one line needs to drive multiple TLC5940 you may need to have a buffer to take the signal boost it and then feed it to multiple TLC5940. This latter part is called fan-out
https://en.wikipedia.org/wiki/Fan-out
The maximum fan-out of an output measures its load-driving capability: it is the greatest number of inputs of gates of the same type to which the output can be safely connected.

The whole area comes under what we call Drive capabilities of a pin and there are low speed DC characteristics which often is just how much current they can supply and high speed characteristics which is how much capacitance and inductance they can tolerate.

So there are several problems that can crop up because of the speed of that signal and you are pushing the PI GPIO to limits and there are no datasheets on the high speed operation of the GPIO pins. My warning comes from practical experience :-)

zedrummer
Posts: 38
Joined: Sun Jan 07, 2018 5:15 pm

Re: Timer with nanosecond accuracy for electronic purpose

Fri Jan 12, 2018 6:37 am

Wow, how to make things worse !
OK, so the best way to go is the empirical ? I mean I test with high frequency/short wait between each bit, then if it doesn't work, extend a little bit, etc...?

LdB
Posts: 866
Joined: Wed Dec 07, 2016 2:29 pm

Re: Timer with nanosecond accuracy for electronic purpose

Fri Jan 12, 2018 11:53 am

zedrummer wrote:
Fri Jan 12, 2018 6:37 am
Wow, how to make things worse !
OK, so the best way to go is the empirical ? I mean I test with high frequency/short wait between each bit, then if it doesn't work, extend a little bit, etc...?
If you don't have access to a oscilloscope I would start other way, test it all at slow speed and get everything working and then start reducing the delays until you get to maximum or something breaks. A couple of hundred nanosecond pulses shouldn't be a problem.

User avatar
jbeale
Posts: 3366
Joined: Tue Nov 22, 2011 11:51 pm
Contact: Website

Re: Timer with nanosecond accuracy for electronic purpose

Sat Jan 13, 2018 6:49 am

I got a 20 nsec pulse from a GPIO pin driving a small VCSEL directly, just as a point of reference.
Here is the post: viewtopic.php?f=72&t=67741&start=25

zedrummer
Posts: 38
Joined: Sun Jan 07, 2018 5:15 pm

Re: Timer with nanosecond accuracy for electronic purpose

Sat Jan 13, 2018 12:43 pm

I have an oscilloscope, I'm gonna try this way.
Thanks to all the helpers, it's kind of you
Cathy L.

dwelch67
Posts: 944
Joined: Sat May 26, 2012 5:32 pm

Re: Timer with nanosecond accuracy for electronic purpose

Sun Jan 14, 2018 10:37 pm

Really interesting and extensive answer.
In my case, it is just a loop with:

another_cycle:
nop
subs r0,r0,1
beq another_cycle

Could it really change from one time to another ?
Sure I have demonstrated this many times, in the pi the execution speed of that loop or even simpler without the nop, can vary by 20x or more. A fair amount of that you can control but timed loops dont work well on cores with pipelines like these, against caches against dram. (by definition no instruction takes only one clock cycle, the pipe and other features attempt a desire of an average of one. its like if you see a new car come out of the factory door every minute doesnt mean it takes a minute to make a car. same exact thing is going on here).

The bigger fear is that if you think you have tuned your loop for whatever you are doing to some point then you change some code. for example, add a nop before the code

nop
another_cycle:
nop
subs r0,r0,1
beq another_cycle

then another, then another. even when completely in L1 cache you are messing with the fetch line. if you are using accurate timing you will see it vary, or if with the armv8 they have added a gross amount of overhead like the x86 (unlikely) then the difference is buried in the noise.

you can still tune your loop, and keep it in the bootstrap early enough that it is not affected by lines of C code coming and going, but all of the code in the binary that is after a function or line of code changed in any of the languages may have an alignment change and thus a performance change.

If this werent the rasberry pi then you might have a better chance as you are competing in the L2 with the gpu...and even if you werent dram is not deterministic in its response, so if you leave the caches, you get those hits as well as a varying dram. ideally you need to do the system design such that the worst case wont affect your results, and if I remember your initial requirements this platform wont meet those needs in general. careful hand tuning, perhaps.

Return to “Bare metal, Assembly language”