Hello maddin,
The problem you have is probably more related to the MCP2515 driver and interrupt than to the SPI driver as it continues to work properly.
This seems close to the problem I had with the MCP251x drivers with the kernel 3.6. This seems to be related to the IRQF_ONESHOT flag that is needed with this kernel.
What I had was:
- kernel 3.2 + mcp251x was working
- kernel 3.6 + mcp251x worked some time and finished to hang
- kernel 3.6 + mcp2515 @ 500kbps at full load with 8 bytes messages works
http://lxr.free-electrons.com/source/include/linux/interrupt.h?v=2.6.34#L31 wrote: 52 * IRQF_ONESHOT - Interrupt is not reenabled after the hardirq handler finished.
53 * Used by threaded interrupts which need to keep the
54 * irq line disabled until the threaded handler has been run.
As I understands it, the ONESHOT flag doesn't allow to catch an interrupt happening while in the interrupt routine. So using the mcp2515x driver along with the spi low latency patch was enough to handle the message fast enough at 500kbps to avoid this particular case.
However, here you have 2 drivers. It doubles virtually the bus speed.
What may happen is that a irq is catched on the first mcp, while processing it, an irq is catched on the second mcp. It takes some time to finish to process the first mcp, then the spi switch to the second mcp and start reading the registers. Due to the time taken before starting this operation, a second message was already received in the buffer.
If, during the time between the reading is finished and the irq is release, a third message is received, the interrupt line of the mcp will be activated again, but the driver can not see it as it configured for one shot only. Then the interrupt line of that driver stays low, and no more falling edge can be detected, so this bus seems to stop.
You can see that the interrupt number is really low for the driver that stops working, while the other doesn't show any problems after : as it is therefore alone it is far more difficult to fall in the case where it doesn't have time to handle the interrupt before the next one.
What I propose you is to try one of this:
* test with kernel 3.2 + spi low-latency patch for 3.2 (available in the first pages of this thread) + mcp2515 driver
* put an oscilloscope to both "spi chip enable", and to both mcp2515 interrupts line, to see what I assumed above ( interrupt should go high while the spi enable, after the read of the buffer, and goes low again before the chip enable is disabled)
* test with kernel 3.6 + spi low-latency patch + mcp2515 driver + change irq flags to
- Code: Select all
.irq_flags = IRQF_TRIGGER_LOW | IRQF_ONESHOT,
Putting the irq in level mode and not edge mode should allow it to restart as soon as the irq handler is exited, even if the second interrupt happened while inside the handler and not catched because of the ONESHOT flag. Don't know if it will ever work, but one the principle, this should allow to avoid blocking the bus, even if some messages are lost.
We should continue this discussion on the can converter thread.
Zeta