SPI driver latency and a possible solution


198 posts   Page 1 of 8   1, 2, 3, 4, 5 ... 8
by msperl » Mon Oct 08, 2012 9:41 am
Hi!

When attaching a MCP2515 CAN bus controller to my RPI I started investigating the implementation performance and had to realize, that for some reason the RPI was unable to handle high load on the CAN bus (>100kHz CAN bus and duty cycle >50%). This resulted in loss of packets and errors showing on the CAN-bus-interface.

(See also: http://www.raspberrypi.org/phpBB3/viewtopic.php?f=44&t=7027&start=125)

Attaching a logic-analyzer to the "relevant" lines on the RPI (Enable, MISO,MOSI,Clock as well as the interrupt line plus signal on the Can-bus itself) showed that there where times when the RPI stopped sending SPI requests and the bus was idle for typically 4ms. During such times the CAN bus (as it is a broadcast medium) is still sending further packets, which can not get handled.
By the time that the SPI transfers start again there have been 13-14 messages (when using 500kHz at 50% Dutycycle) lost, as the MCP2515 only has a buffer for 2 messages.

Here a "scaled down" screenshot from the logic analyzer, where you see the gap in the top 4 graphs as well as the bottom most, which is the interrupt line of the mcp2515, which indicates there are packets to fetch.... The 2 lines above are the CAN Bus lines (RX and TX) that show that the packet-flow continues during this gap.
CAN2.png
SPI and CAN Bus with latencies on SPI using the original SPI-driver
CAN2.png (61.04 KiB) Viewed 19499 times


Also the observation has been made that a lot of time is spent "unproductive":
  • between ENABLE LO and CLOCK run - in the range of 0.003ms
  • between CLOCK stop and ENABLE HI - in the range of 0.010ms

and those numbers are slowly adding up to quite amount of time.

Looking on the internet revealed several posts that say that the basic SPI interface of linux has some performance issues when faced with high SPI frequencies and rates.
As an example see http://gumstix.8.n6.nabble.com/Howto-get-lower-latency-on-the-SPI-bus-td566254.html for a similar issue on an OMAP system.
And this one indicated changing the driver (again for a different ARM board) to use RealTime for its worker-model: http://www.mail-archive.com/spi-devel-general@lists.sourceforge.net/msg07619.html

So investigating the issue resulted in realizing, that for the RPI the SPI driver is also based on a workqueue model (which runs at normal priorities - not RealTime scheduling) and is implemented interrupt driven (so no DMA, but still better than polling).

Work-queues are essentially "normal" processes (with the exception that they run in kernel space and can modify kernel data) and are thus susceptible to OS scheduling - and the 4ms gap is an indication that a different process has been scheduled and the workqueue has to wait...

In the meantime I also started to look upstream and came to realize that as of 22nd of February there is a patch that "centralized" some of the scheduling work-queue code into the SPI-core component (after the basic implementation had been proven to improve the situation tremendously on the PL022 platform): http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=patch;h=ffbbdd21329f3e15eeca6df2d4bc11c04d9d91c0.

Unfortunately this patch only got included with the Linux 3.4 kernel, while the RPI kernel is still based on 3.2.

This patch also includes a new simplified interface that reduces the amount of code in the individual SPI drivers, which no longer need to have their own work-queue implementations.
At this moment only a few SPI-drivers in the latest Linux kernel have been moved to this newer interface.

To address this inefficiency I started back-porting the above patch and subsequently started to modify the spi-bcm2708 driver to use the new interface and also put the interface into real-time scheduling mode.

So far I have succeeded with having the driver run with a simple polling-implementation.

But the result is that the driver already can handle the load of 50% Duty cycle at 500kHz on the CAN Bus without loosing a packet - the SPI bus keeps working without those 4ms interruptions and everything is fine (besides the high CPU load at such situations - up to 50%).

The next steps are now implementing the Interrupt version, which should reduce the CPU needs of the SPI thread polling for new data to get sent/received.

Also the above mentioned issues of a big delay between Enable low and Clock Start (and vice-verso) is gone with the polling driver - the trick is to schedule the first byte for SPI (by adding it to the HW-FIFO) only a few cycles after you start the SPI HW. The current stock driver implements the pushing data into the FIFO in the interrupt handler, which has to trigger first (with some OS overhead) and this seems to trigger the delay observed.

So I believe that this back-port is already quite successful and would probably also benefit other use-cases, where high SPI thru-put with low latencies is required.

I will share my patch (including the back-port) here as soon as I get the interrupt handler implementation running...
Later I may also give a DMA-implementation a try, as this would reduce the number of interrupts further...

This may also make the Transfers a bit faster, as the current polling implementation is producing a "gap" of 1-2 SPI clock-cycles between bytes sent on the bus. We will see if/when we get there...

In the hope that this will help with latency-issues with other SPI-applications as well...

Ciao,
Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Mon Oct 08, 2012 4:38 pm
Hi!

As a status update I got the interrupt driver working - attached the diff for my spi patch...

My workload benchmark was: 1000kBit CAN Bus, with about 26% Duty Cycle for 4101 packets.
The RPI is getting higher CPU loads, but it does not loose a single packet even when compiling the kernel in background!

Here some measures for that time:
Code: Select all
root@raspberrypi:~# ip -s -d  link show can0; grep -E "mcp25|_spi" /proc/interrupts
8: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state ERROR-ACTIVE restart-ms 0
    bitrate 1000000 sample-point 0.750
    tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
    mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 8000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0         
    RX: bytes  packets  errors  dropped overrun mcast   
    32800      8202     0       0       0       0     
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0     
 80:      61592   ARMCTRL  bcm2708_spi.0
195:      20505      GPIO  mcp251x
root@raspberrypi:~#

here the load of 4051 packet over a timespan of 1.5 seconds-

root@raspberrypi:~# ip -s -d  link show can0; grep -E "mcp25|_spi" /proc/interrupts
8: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state ERROR-ACTIVE restart-ms 0
    bitrate 1000000 sample-point 0.750
    tq 125 prop-seg 2 phase-seg1 3 phase-seg2 2 sjw 1
    mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 8000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0         
    RX: bytes  packets  errors  dropped overrun mcast   
    49200      12303    0       0       0       0     
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0     
 80:      73895   ARMCTRL  bcm2708_spi.0
195:      24606      GPIO  mcp251x


So you can see that the counters go up 4101 packet count as well as 4101 interrupts for the mcp251x interrupt.
On top there are 12303 (=3*4101) interrupts on the SPI driver, which coincides with 3 SPI commands issued by the mcp251x driver per interrupt (Check which Buffer contains data, read the buffer, check if another buffer contains data)

All those numbers fall in place perfectly...

And as I have said that was while compiling the linux kernel in background just to confirm that it is working...

And that is a big improvement over the original SPI-driver.

Here the CPU load (via vmstat 1) on an idle system to handle 40965 packets:
Code: Select all
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0      0 152508  11352  39024    0    0   149     9 1158 1979  3  6 89  1
 0  0      0 152508  11352  39024    0    0     0     0  326   54  0  0 100  0
 0  0      0 152508  11352  39024    0    0     0     0  325   52  0  0 100  0
 0  0      0 152508  11352  39024    0    0     0     0  326   55  0  0 100  0
 0  0      0 152508  11352  39024    0    0     0     0  328   52  1  0 99  0
 0  0      0 152508  11352  39024    0    0     0     0  325   54  0  0 100  0
 0  0      0 152508  11352  39024    0    0     0     0  320   50  0  0 100  0
 0  0      0 152508  11352  39024    0    0     0     0 14109 41306  0 38 62  0
 0  0      0 152508  11352  39024    0    0     0     0 14323 41883  0 41 59  0
 0  0      0 152500  11352  39024    0    0     0     0 14321 41906  0 42 58  0
 0  0      0 152500  11352  39024    0    0     0     0 14292 41774  1 39 60  0
 0  0      0 152500  11352  39024    0    0     0     0 14293 41828  0 40 60  0
 0  0      0 152500  11352  39024    0    0     0     0 14330 41937  0 40 60  0
 0  0      0 152500  11352  39024    0    0     0     0 14329 41890  0 43 57  0
 0  0      0 152500  11352  39024    0    0     0     0 14329 41873  0 42 58  0
 0  0      0 152500  11352  39024    0    0     0     0 14328 41896  0 38 63  0
 0  0      0 152500  11352  39024    0    0     0     0 14330 41904  0 42 58  0
 0  0      0 152500  11352  39024    0    0     0     0 14315 41913  0 39 61  0
 0  0      0 152500  11352  39024    0    0     0     0 14337 41897  0 44 56  0
 0  0      0 152500  11352  39024    0    0     0     0 14327 41900  0 37 63  0
 0  0      0 152500  11352  39024    0    0     0     0 14315 41882  0 38 63  0
 0  0      0 152500  11352  39024    0    0     0     0 9950 28779  0 29 71  0
 0  0      0 152500  11352  39024    0    0     0     0  326   56  0  0 100  0
 0  0      0 152500  11352  39024    0    0     0     0  323   54  0  0 100  0
 0  0      0 152500  11352  39024    0    0     0    12  337   59  0  0 100  0
 0  0      0 152500  11352  39024    0    0     0     0  336   58  1  1 98  0


where we see that we get to 44% CPU Load on the kernel side (high system load)...
But this also includes the upper level processing of the CAN network stack and is not only SPI related...

As I have said: other high bandwidth SPI devices should also see performance improvements and provide close to Realtime handling.

The spi-bcm2708 driver now also includes an option to set the processing-mode: polling,interrupt,dma,mixed - where dma is not implemented yet, while "mixed" currently falls back to interrupt. But I read comments in drivers for different platforms, that only do DMA setup for very long transfers, like writing to an SD drive, while for smaller transfers they use polling. So this is the reason for this option.

Please test it to see that it works under all circumstances - there may be some cases that I have missed...
I will also contact Chris Boot (the author of the original version), so that he can include it upstream...

I will now start looking into possibility of implementing the DMA mode to reduce the CPU load further still...

Ciao,
Martin
Attachments
spi.diff.txt.bz2
SPI driver patch for higher performance SPI performance without delays...
(10.12 KiB) Downloaded 728 times
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by jbeale » Mon Oct 08, 2012 4:52 pm
This is really great work, thanks so much for your SPI driver improvements! (I have been planning to use the R-Pi's SPI with a ADC chip, but have not yet gotten to it.)
User avatar
Posts: 2084
Joined: Tue Nov 22, 2011 11:51 pm
by bertr2d2 » Mon Oct 08, 2012 4:57 pm
Fantastic ! Great work Martin !

Regards

Gerd
Posts: 86
Joined: Wed Aug 08, 2012 10:12 pm
by lb » Mon Oct 08, 2012 5:00 pm
The patch is unreadable - you messed up formatting of many files. Can you make a proper patch?
Posts: 193
Joined: Sat Jan 28, 2012 8:07 pm
by msperl » Mon Oct 08, 2012 5:19 pm
@lb: i will get to it in time (when I have time to figure out the correct settings in emacs and nano - as i use both to keep "correct" indenting) - I just wanted to get the first version out of the door to share...
Please also note that I really only have touched spi-bcm2708.c all others are just copied (backported) from the tree of Linus...
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Mon Oct 08, 2012 5:39 pm
attached now the updated version with better source code formatting based on the emacs macro in Coding styles...
Attachments
spi2.diff.txt.bz2
updated version of the patch that includes a "better" formatted source
(10.77 KiB) Downloaded 402 times
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by maddin1234 » Mon Oct 08, 2012 8:03 pm
Hi Martin,
I saw that the SPI-driver includes delay.h.
I read something that could be useful here:

https://groups.google.com/forum/?fromgroups=#!topic/bcm2835/sgj53rAH2YU

"I've also modified delayMicroseconds() to use nanosleep() for long waits,
and a busy wait on a high resolution timer for the rest. This is because
I've found that calling nanosleep() takes at least 100-200 us.
You need to link using '-lrt' using this version."
Posts: 68
Joined: Sat Aug 04, 2012 8:33 pm
by msperl » Mon Oct 08, 2012 8:44 pm
Hi maddin!

The driver is not using delay anywhere really - the includes came with the original driver, so I did not touch them - I believe we could remove a lot of those includes...

If I take out the include of delay.h then it compiles all the same.... - same for log2.h, sched.h, wait.h.

Ciao,
Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by rudiratlos » Wed Oct 10, 2012 2:41 pm
Super finding.
I think that is the reason, why I can't move forward with my project. I'm using an RFM Chip (SPI Interface), which should receive 100-200 bytes with highspeed in the 868MHz wireless band. The chip has an integrated 64 byte buffer. And I always get a buffer overrun after reading the first 64 bytes, even though I'm polling the RFM.

How can I use the patch in occidentalis v0.2 ?
I'm not an experianced kernel patcher.

Thanks.
Posts: 64
Joined: Tue May 01, 2012 8:47 am
by Arjan » Fri Oct 12, 2012 6:25 pm
Hi Martin,

Thanks for this great work!

This patch would be useful for the 6 channel D/A SPI board I am working on.

Just asking a favor ; I have extended the two SPI CS lines with a third one by means of the 74HC139.
Therefore I need the additional /dev/spidev-0.2 (were both CS lines are asserted).
Would you be able to add this feature to your patch as well?
Thanks.

Regards, Arjan
Posts: 131
Joined: Sat Sep 08, 2012 1:59 pm
by msperl » Sat Oct 13, 2012 7:15 am
Hi RudiRatlos!

How to compile the kernel yourself - please look at: http://elinux.org/RPi_Kernel_Compilation or google (or google for other sites)

For applying a patch:
Before or after executing the "make menuconfig" step, execute the command "patch -p0 << path/to/patchfile"
Then continue to compile/link your kernel step ("make; make modules") ...

Ciao, Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Sat Oct 13, 2012 7:41 am
Hi Arjan!

The problem is that the hardware chip-select of the bcm2708 can only handle those 3 pre-wired CS/SS-lines of which - it seems - only 2 can are available on the RPI extension ports.

If you need extra chip selects - you can do it (if needed) but there would be quite a lot of work needed to make this work, as you would need to drive the CS lines in (driver) code...

The best (and most generic) solution I can think of is using one of the CS pins as the enabling input to the 74HC139 (Pin1 or 15) and drive the address outside of the SPI driver on GPIO pins (your own driver) - still all those shared devices would need the same clock...

Also I do not know at what speeds you want to/can drive your SPI device clock, but you may also get into "propagation delay issues" with the HC139, where the time between CS and CLOCK start becomes to short...

Ciao,
Martin

P.s: I did do a bit of calculations on a high-speed (400+KSps) multi-ADC (4-8) board too, but I had come to realize that there is quite a lot of BW needed for my idea to work - SPI clocks > 60MHz)
Also with such a setup you really need hard-realtime to make things work - or the ability to allocate continuous physical memory in chunks of 64kb to allow DMA to do most of the work (both of which are not really (easy and consistently) possible with linux.
Also in such a case it would mean that any switch in SPI bus usage will mean losing data so the Tradeoff for 2 CS lines in RPI is good enough in those cases.

If you really do need to multiplex multiple slow SPI-devices, then the 74HC139 as described above may be working, but they all have to run from a single SPI clock speed...
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Sat Oct 13, 2012 10:59 am
Hi!

Here the attached patch that now also allows for DMA transfers, as long as the spi_transfer frames have the rx_dma and tx_dma filled in.

For all practical purposes at least with a mcp2515 attached it does not make a big difference if DMA is used or not. Reason being that all the transfers that the mcp2515 is doing are small in size.

This will be a totally different thing if we talk about bigger transfers of 4096k blocks or similar, because then there would only be 1 interrupt for the whole transfer instead of 64 interrupts for the same amount (assuming that the SPI FIFO has a size of 64 bytes for reads and writes (the document does not say explicitly, but the DC registers with 2 levels of "criticality" and their default values (0x30, 0x20) could indicate this (Page 157)- it could be bigger still).

The code also contains a note that things could get improved further by "chaining" multiple 4k blocks for transfers, but as there is no HW for me to really test this with, I have not implemented it avoiding the extra complexity...

Right now the DMA-mode relies on DMA allocated kernel pages falling back to interrupt mode.
But for all practical purposes - assuming there are no errata in the VideoCore DMA engine, and trusting the documentation on DMA being correct - it should even be possible to do transfers from "normal" pages (even when not being aligned). But that would require an "official" API that can translate kernel addresses to VideoCore BUS addresses (which are needed for DMA).
There is something like dma_to_pfn/pfn_to_dma/dma_to_virt/virt_to_dma in arch/arm/include/asm/dma-mapping.h, but these are "architecture" private and "must not be used by driver". Using those probably could work, but probably would "block" getting the driver into the mainline kernel...

From my measurements with the MCP2515 (54% duty cycle on 500kHz CAN) I do not see much of a difference in performance between interrupts and DMA due to the nature of small SPI messages.
The CPU Load in this case is 22% CPU on the (rt) IRQ handler for SPI and 21% CPU on the IRQ Handler thread for the mcp251x driver.

As there are latencies still, things could get improved by making the mcp251x driver work with the asynchronous SPI interface instead of waiting, but that is another driver...

I assume that there will be bugs, so beware - at least for me the mcp251x usecase is working as expected...

You can also force different process-modes by setting the "processmode=0,1,2" argument when loading the module (2=dma is the default, 1=interrupt driven, 0=polling)

Ciao,
Martin

P.s: I think the lessons learned on the DMA may also help other people who need to create a driver using DMA...
Attachments
spi-bcm2708.diff.bz2
patch for SPI with DMA enabled
(13.84 KiB) Downloaded 319 times
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by psergiu » Sun Oct 14, 2012 3:45 pm
Hi msperl !
I succesfully applied your latest patch, recompiled the kernel & the modules but now the enc28j60.ko module (ENC28J60 SPI Ethernet) will no longer detect the device. (see: viewtopic.php?f=44&t=18397 )

Do i need to do some tweaks in the driver source or use the driver from the 3.4 kernel tree ?

Code: Select all
[    6.252432] bcm2708_spi bcm2708_spi.0: DMA channel 0 at address 0xc8808000 with irq 16
[    6.429609] bcm2708_spi bcm2708_spi.0: DMA channel 4 at address 0xc8808400 with irq 20
[    6.620387] spi_master spi0: will run message pump with realtime priority
[    6.767241] bcm2708_spi bcm2708_spi.0: SPI Controller at 0x20204000 (irq 80)
[    6.905549] bcm2708_spi bcm2708_spi.0: SPI Controller running in dma mode
[    9.212595] enc28j60 spi0.0: enc28j60 Ethernet driver 1.01 loaded
[    9.334089] enc28j60 spi0.0: enc28j60 chip not found
[    9.432387] enc28j60: probe of spi0.0 failed with error -5


I tried in DMA, IRQ & polling mode - same error.
User avatar
Posts: 212
Joined: Mon Nov 07, 2011 8:36 am
Location: Bucharest, Romania
by msperl » Sun Oct 14, 2012 4:38 pm
Hi psergui!

I do not fully understand where the enc28j60 driver really fails, but worsted case it may be that there is a "dependency" on the original spi driver backport.

Unfortunately I do not have a enc28j60 chip for testing (attaching a logic analyzer to the lines), so I can not tell you where there may be an issue...

I will contact you offline for some analysis and we can share the results here...

Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Mon Oct 15, 2012 9:31 pm
Status update for all:
the device enc28j60 seems to behave strangely, so some more work need to get done to drill down into the root cause - I even ordered one to test it myself, so I may be able to test it next weekend (or whenever it arrives)

But in the meantime I have taken an SD card and attached it to my RPI using the mmc_spi driver and this one works as expected - I can read the data without any issues.
As I have limited myself to 4MHz SPI bus speed (to allow analysis with my logic analyzer) the transfer rate is at 370kb/s, but higher speeds should work just as well... If you put it into perspective, this 370kB/s means 2960 kBit/s, so the SPI bus is used quite efficiently I have to say...

One thing that does not work with this mmc_spi driver is DMA - which in this case would REALLY help, as there are a lot of interrupts in DMA mode (reading 64MB from the SD card produced 1850860 interrupts - that is about 28 interrupts/kByte) . The mmc_spi driver does not take an argument to enable DMA but tries to infer it via the structure dev.dma_map (if set, then it assumes it can do DMA).

Unfortunately to me it is not clear how this dma_map works/is used, so that the spi-driver can provide it (just setting it has not the desired effect - the driver is stuck at some point during initialization...). So, if there is someone with knowledge in this, then please step forward, so that we can improve things further...

Ciao,
Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Fri Oct 19, 2012 10:20 am
I have received an enc28j60 module which I have used for debugging the issue.

The reason for the patch posted previously not working is that when there was a chunked spi transfer issued, the driver took the ChipSelect line high between each chunk getting transferred, which essentially finished the transfer and started a new one as seen from the device

The following patch on top of the patch above will resolve the issue:
Code: Select all
--- /usr/src/linux/drivers/spi/spi-bcm2708.c    2012-10-13 09:28:19.254688643 +0000
+++ drivers/spi/spi-bcm2708.c       2012-10-19 09:32:56.280497957 +0000
@@ -651,6 +651,8 @@
                }
                if (status)
                        goto exit;
+               /* keep Transfer active until we are triggering the last one */
+               if (!(flags&FLAGS_LAST_TRANSFER)) { state.cs|= SPI_CS_TA; }
                /* now send the message over SPI */
                switch (processmode) {
                case 0: /* polling */
@@ -681,6 +683,9 @@
                }
                if (status)
                        goto exit;
+               /* delay if given */
+               if (xfer->delay_usecs)
+                       udelay(xfer->delay_usecs);
                /* and add up the result */
                msg->actual_length += xfer->len;
        }

The patch above now also implements the missing delay feature of the SPI layer

Please test some other (exotic) devices to find any other issues...

I may try to improve the DMA-mode - maybe there is a way to "autodetect bus-addresses" so that drivers do not have to allocate dma-addresses themselves - that way we could even do direct DMA to user-space (assuming that the SPI driver and/or spidev allows that...)
Still as far as I can tell there is no "advantage" to using DMA as long as you transfer less than 64 bytes (which is the FIFO-buffer size of the SPI controller).

Thanks,
Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by psergiu » Sat Oct 20, 2012 6:23 pm
My ENC28J60 works now ! Thanks.
No major speed improvements, unfortunatelly (~130KB/s vs ~100KB/s before)
I'll try wiring up a GPIO as an IRQ line
User avatar
Posts: 212
Joined: Mon Nov 07, 2011 8:36 am
Location: Bucharest, Romania
by stedew » Sun Oct 21, 2012 9:55 am
Hello msperl
I have a problem with aplying the latest patch:
root@raspberrypi:/usr/src# cp /home/pi/spi2_B.diff .
root@raspberrypi:/usr/src# patch -p0 <spi2_B.diff
patching file /usr/src/linux/drivers/spi/spi-bcm2708.c
Hunk #1 FAILED at 651.
Hunk #2 FAILED at 681.
2 out of 2 hunks FAILED -- saving rejects to file /usr/src/linux/drivers/spi/spi-bcm2708.c.rej
Maybe i have done something with the format of the file (just copy & past) it in netbeans & then ftp to the pi home dir (it's name is spi2_B.diff) as you already noticed. Also with the first patch i had to specify where the files where locatated in the dir tree patch started in /usr/src .. If there is someone that can clarify this for a linux noob thnx in advance .
Posts: 3
Joined: Sun Oct 21, 2012 9:37 am
by msperl » Sun Oct 21, 2012 10:29 am
Good to hear that it works for you now!

Using interrupts with the enc28j60 driver should help, because without polling the driver needs to do polling of the interface to detect new packets and this means latencies (depending on how often the driver does do polling).
The driver also tries to use some means to use DMA, but it does not do it using the tx_dma/rx_dma interface that spi_transfer structures provide.

As said, I may investigate making DMA by default (without the need for tx_dma/rx_dma in the driver).
Still if the transfers over the SPI bus are smaller than 64bytes DMA does not give you a real advantage, as the same amount of IRQs will get triggered for DMA and Interrupt-driven mode. DMA only has an advantage for bigger transfers, where you can avoid the additional interrupts and thus avoid the IRQ overhead and latencies - this also means lower CPU utilization.

DMA may also improve situations where there are "chunked" transfers (multiple spi_transfers for a single spi_message), but for that the driver would need to get optimized further to program the DMA engine to do those transfers automatically in sequence...

Note that you can exploit this "chunked transfer" from user-space with the spidev interface when using the ioctl mode, but unfortunately SPIDEV does not support passing of tx_dma/rx_dma (for which there is no interface in user-space)

So essentially it is not possible to do DMA transfers directly to userspace unless the spi-bcm2708 driver can translate to BUS addresses (I may investigate if this can get done...) But those addresses would need to be protected against swapping out by pinning them to memory via mlockall (unless the SPI device driver will make sure that this does happen), so that the OS can not swap out the memory page while the transfer occurs...

With IOCTL, one can theoretically trigger direct DMA of up to 32k transfers in one chunk (ioctl max limit of 1MB data length divided by 32 bytes for each spi_ioc_transfer chunk) , each of which could be of a size of up to 64K bytes (max SPI transfer size in bytes for the BCM2708 HW).

Doing the math one could theoretically transfer up to 2GB with only one call and with only one interrupt - and at 32MHz SPI Bus rate that transfer would take 576 seconds of continuous streaming of data without the need of a single CPU cycle even once... (There may be bigger issues with the Bandwidth on the Bus that may negatively impact CPU performance in the meantime)

We would have to wait for a few more revisions of the RPI to get to 2GB of memory for a single user process ;)

Again: the Driver would need to support this automatic translation of ARM addresses to BUS addresses...

Ciao,
Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Sun Oct 21, 2012 10:39 am
the copy/paste solution may have made spaces out of tabs, which will make patch "complain" for one thing.
The other thing is that for patch to work correctly, you have to be in the directory from which patch expects to work.
So in my case I have the checkout of git in /usr/src/linux.
Also the patch is starting with:
Code: Select all
--- /usr/src/linux/drivers/spi/spi-bcm2708.c    2012-10-13 09:28:19.254688643 +0000
+++ drivers/spi/spi-bcm2708.c       2012-10-19 09:32:56.280497957 +0000

so the "+++" line describes which file to patch relative to the current directory.

if you need to "strip" directories (as they do not exist in the same manner on your system, then you need to apply the option -p "<number of directories to strip from patch>".
so in case of the example above, if you are in directory drivers/spi already, then you would apply the patch like this:
Code: Select all
patch -p2 < /tmp/patch.to.apply.txt


hope this helps.

To avoid "patching the patch", I will post a complete patch-set when I get back this evening...

Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Sun Oct 21, 2012 9:54 pm
OK, here the "final" version of the patch!

I have also contacted the Chris Boot, the author of the original driver, but there was no feedback so far...

So I assume we will have for him to get the driver included with the "stock" RPI kernel...

Ciao,
Martin

P.s: apply in the git checkout directory like this:
Code: Select all
patch -p1 < spi-lowlatency.patch
Attachments
spi-lowlatency.patch.bz2
the final version of the patch
(13.99 KiB) Downloaded 497 times
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by lb » Mon Oct 22, 2012 12:45 pm
At the moment, the proper way to get your stuff included seems to be Github pull requests. Worked for me, at least. ;)
Posts: 193
Joined: Sat Jan 28, 2012 8:07 pm
by zia_7575 » Tue Oct 23, 2012 7:02 am
hey,

What is difference between bcm2708_transfer_one_message_dma and bcm2708_transfer_one_message_dma_interrupts?

Regards,
Zia
Posts: 11
Joined: Thu Oct 18, 2012 6:39 am