JacobL wrote:I assume that you will of course use the FIQ, and limit your code to use R8-R12 + SP + LR? I'm pretty sure the full IRQ resolver will take longer time than you have.
Yes, I meant FIQ not FIR
The interrupt itself would normally occur from one clock cycle to the next (though you have to add an unknown delay due to all interrupts being routed through the GPU). But that is just the act of loading 0x1C in pc. After that, the typical interrupt vector table would have both an instruction fetch and a data fetch, though a hardcoded branch could limit that to just the instruction.
With the LPC the handler was entered in about 120 nsec, the total bus cycle of that 8048 was 2.5 usec so I had some more time to play. The width of the RD or WR pulse was about 1 usec and in that time I could read the address bus and prepare the data to be emitted. After that I just busy waited for the RD or WR line to change and removed the data from the bus again.
The better part of the 1.5 usec left was still available for the LPC to do (lots) of other things. Also the RD/WR interrupt was the only one on the system. All other semi-critical timing was done by polling the system timer from the main loop
It might be doable, but you won't have much room for fancy stuff. I would avoid using the stack or function calls (freeing up SP + LR as general purpose registers), and generally try to limit memory access. You should also keep your application footprint small enough to be able to guarantee that the cache will always be hot, one cache miss at this level could break it.
Yes, the cache might play tricks here. The size of the routine itself will be quite small: some GPIO reading, and a single memory fetch from RAM. The layout of the GPIO pins might be troublesome, (at least) 8 bits in a row would be ideal...