When attaching a MCP2515 CAN bus controller to my RPI I started investigating the implementation performance and had to realize, that for some reason the RPI was unable to handle high load on the CAN bus (>100kHz CAN bus and duty cycle >50%). This resulted in loss of packets and errors showing on the CAN-bus-interface.
(See also: http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=7027&start=125)
Attaching a logic-analyzer to the "relevant" lines on the RPI (Enable, MISO,MOSI,Clock as well as the interrupt line plus signal on the Can-bus itself) showed that there where times when the RPI stopped sending SPI requests and the bus was idle for typically 4ms. During such times the CAN bus (as it is a broadcast medium) is still sending further packets, which can not get handled.
By the time that the SPI transfers start again there have been 13-14 messages (when using 500kHz at 50% Dutycycle) lost, as the MCP2515 only has a buffer for 2 messages.
Here a "scaled down" screenshot from the logic analyzer, where you see the gap in the top 4 graphs as well as the bottom most, which is the interrupt line of the mcp2515, which indicates there are packets to fetch.... The 2 lines above are the CAN Bus lines (RX and TX) that show that the packet-flow continues during this gap.
Also the observation has been made that a lot of time is spent "unproductive":
- between ENABLE LO and CLOCK run - in the range of 0.003ms
- between CLOCK stop and ENABLE HI - in the range of 0.010ms
and those numbers are slowly adding up to quite amount of time.
Looking on the internet revealed several posts that say that the basic SPI interface of linux has some performance issues when faced with high SPI frequencies and rates.
As an example see http://gumstix.8.n6.nabble.com/Howto-get-lower-latency-on-the-SPI-bus-td566254.html for a similar issue on an OMAP system.
And this one indicated changing the driver (again for a different ARM board) to use RealTime for its worker-model: http://firstname.lastname@example.org/msg07619.html
So investigating the issue resulted in realizing, that for the RPI the SPI driver is also based on a workqueue model (which runs at normal priorities - not RealTime scheduling) and is implemented interrupt driven (so no DMA, but still better than polling).
Work-queues are essentially "normal" processes (with the exception that they run in kernel space and can modify kernel data) and are thus susceptible to OS scheduling - and the 4ms gap is an indication that a different process has been scheduled and the workqueue has to wait...
In the meantime I also started to look upstream and came to realize that as of 22nd of February there is a patch that "centralized" some of the scheduling work-queue code into the SPI-core component (after the basic implementation had been proven to improve the situation tremendously on the PL022 platform): http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=patch;h=ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0.
Unfortunately this patch only got included with the Linux 3.4 kernel, while the RPI kernel is still based on 3.2.
This patch also includes a new simplified interface that reduces the amount of code in the individual SPI drivers, which no longer need to have their own work-queue implementations.
At this moment only a few SPI-drivers in the latest Linux kernel have been moved to this newer interface.
To address this inefficiency I started back-porting the above patch and subsequently started to modify the spi-bcm2708 driver to use the new interface and also put the interface into real-time scheduling mode.
So far I have succeeded with having the driver run with a simple polling-implementation.
But the result is that the driver already can handle the load of 50% Duty cycle at 500kHz on the CAN Bus without loosing a packet - the SPI bus keeps working without those 4ms interruptions and everything is fine (besides the high CPU load at such situations - up to 50%).
The next steps are now implementing the Interrupt version, which should reduce the CPU needs of the SPI thread polling for new data to get sent/received.
Also the above mentioned issues of a big delay between Enable low and Clock Start (and vice-verso) is gone with the polling driver - the trick is to schedule the first byte for SPI (by adding it to the HW-FIFO) only a few cycles after you start the SPI HW. The current stock driver implements the pushing data into the FIFO in the interrupt handler, which has to trigger first (with some OS overhead) and this seems to trigger the delay observed.
So I believe that this back-port is already quite successful and would probably also benefit other use-cases, where high SPI thru-put with low latencies is required.
I will share my patch (including the back-port) here as soon as I get the interrupt handler implementation running...
Later I may also give a DMA-implementation a try, as this would reduce the number of interrupts further...
This may also make the Transfers a bit faster, as the current polling implementation is producing a "gap" of 1-2 SPI clock-cycles between bytes sent on the bus. We will see if/when we get there...
In the hope that this will help with latency-issues with other SPI-applications as well...