CAN controller


540 posts   Page 7 of 22   1 ... 4, 5, 6, 7, 8, 9, 10 ... 22
by msperl » Sun Oct 28, 2012 5:26 pm
Hi!

after experiementing with the 3.6.y branch and preparing a patch for the SPI latency issue for 3.6 I came across the issue that the mcp251x driver now emits the message:
Code: Select all
[  107.367820] genirq: Threaded irq requested with handler=NULL and !ONESHOT for irq 195
[  107.367905] mcp251x spi0.0: failed to acquire irq 195


What is needed to make it work with 3.6.y kernels (and probably any later versions) is to modify arch/arm/mach-bcm2708/bcm2708.c and change the interrupt config there like this:
Code: Select all
 static struct mcp251x_platform_data mcp251x_info = {
         .oscillator_frequency   = 16000000,
         .board_specific_setup   = NULL,
-         .irq_flags              = IRQF_TRIGGER_FALLING,
+         .irq_flags              = IRQF_TRIGGER_FALLING|IRQF_ONESHOT,
         .power_enable           = NULL,
         .transceiver_enable     = NULL,
 };


As far as I have read this is a new sanity check that is needed to make things work safely...

With this it works for me like a charm...

Ciao, Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by skyfisch » Sun Dec 02, 2012 6:10 am
Hi maddin1234,

I was communicating with mcp2515 with Raspbian "wheezy"
2012-09-18-wheezy-raspbian ,
following the descriptions you posted on Aug 18.

But after making a new kernel (it took very long,about 6hours), I couldn't find the image under /usr/src/linux/arch/arm/boot, so I couldn't replace the old kernel.
And after I reboot the system, the keyboard doesn't work, but I can operate with SSH.

Have you met with these issues before?
What might be the problems?
Could you help me?

Does your mcp2515 work well now? Would you be so kind to write a new complete description to communicate with mcp2515 with the newest raspbian distribution? Since I am a beginner, I didn't quite understand step 7 and step 9 of your post. The complete description is quite helpful for me.

Thanks a lot.

skyfisch
Posts: 2
Joined: Wed Oct 31, 2012 2:02 pm
by maddin1234 » Wed Dec 05, 2012 8:00 pm
Hi,
I didn't change my system for a while
(never touch a running system ;) )

The only reason I can think for your problem is,
that you have a problem when compiling the kernel.
I had it once, when I typed a wrong letter when adding
the bcm2708 file.

Try to compile the kernel again without changing anything.
All files that compiled fine will not be compiled again.
So you might find what went wrong.

Point 7 from my instruction describes patching the board-configuration
There were many improvements to this patch.

Point 9 is the command to copy the new build kernel, it will fail, if make
didn't succeed to build a new kernel

Greeting maddin1234
Posts: 68
Joined: Sat Aug 04, 2012 8:33 pm
by skyfisch » Fri Dec 07, 2012 12:11 pm
Hi,maddin1234.
Thank you for your instructions.My system worked.

The reason for my problem was, the capacity of my partition to make new kernel was not large enough.

skyfisch
Posts: 2
Joined: Wed Oct 31, 2012 2:02 pm
by muellie » Sun Dec 09, 2012 7:28 pm
Hello,

first of all I would like to thank all of you for the great help, especially bertr2d2 and maddin1234. I started three month ago without any basic knowledge about unix/debian/linux, never worked in a terminal and wanted to get SocketCAN integrated and running on my system (still working on it). I really gained a lot of knowledge and experience in the last weeks :)

I would like to share two things here:
I adapted the patch from http://lnxpps.de/rpie/ for the new Raspbian version (2012-10-28):
Code: Select all
--- bcm2708.c_org   2012-11-18 15:10:14.000000000 +0100
+++ bcm2708.c   2012-11-21 22:49:28.000000000 +0100
@@ -54,6 +54,12 @@
 #include <mach/vcio.h>
 #include <mach/system.h>
 
+#include <linux/can/platform/mcp251x.h>
+#include <linux/gpio.h>
+#include <linux/irq.h>
+
+#define MCP2515_CAN_INT_GPIO_PIN 25
+
 #include "bcm2708.h"
 #include "armctrl.h"
 #include "clock.h"
@@ -580,10 +586,20 @@
    .resource = bcm2708_spi_resources,
 };
 
+static struct mcp251x_platform_data mcp251x_info = {
+   .oscillator_frequency   = 16000000,
+   .board_specific_setup   = NULL,
+   .irq_flags              = IRQF_TRIGGER_FALLING|IRQF_ONESHOT,
+   .power_enable           = NULL,
+   .transceiver_enable     = NULL,
+};
+
 static struct spi_board_info bcm2708_spi_devices[] = {
    {
-      .modalias = "spidev",
-      .max_speed_hz = 500000,
+      .modalias = "mcp2515",
+      .max_speed_hz = 10000000,
+      .platform_data = &mcp251x_info,
+      /* .irq = unknown , defined later thru bcm2708_mcp251x_init */
       .bus_num = 0,
       .chip_select = 0,
       .mode = SPI_MODE_0,
@@ -596,6 +612,12 @@
    }
 };
 
+static void __init bcm2708_mcp251x_init(void) {
+   bcm2708_spi_devices[0].irq = gpio_to_irq(MCP2515_CAN_INT_GPIO_PIN);
+   printk(KERN_INFO " BCM2708 mcp251x_init:  got IRQ %d for MCP2515\n", bcm2708_spi_devices[0].irq);
+   return;
+};
+
 static struct resource bcm2708_bsc0_resources[] = {
    {
       .start = BSC0_BASE,
@@ -722,6 +744,7 @@
    system_serial_low = serial;
 
 #ifdef CONFIG_SPI
+   bcm2708_mcp251x_init();
    spi_register_board_info(bcm2708_spi_devices,
          ARRAY_SIZE(bcm2708_spi_devices));
 #endif


And I do have one hint:
Probably it is better to use "&&" rather than ";" to build the kernel, to get eventual faults also in the first and second command on the screen.

Code: Select all
# make && make modules && make modules_install


And I have a few questions:
I also integrated the virtual can module. How can I set up it up? I tried
Code: Select all
ip link add dev vcan0 type vcan

this operation was not possible (I did it in this way on Ubuntu). Do I need to configure something different in ip link ? Why was it not necessary to do this operation for the can bus with the MCP2515 (I directly tried to set it up, like in the explanation)?
I read something that there is a config file to predefine the buses, but that this was used in older SocketCAN versions only.

BTW, I used a SN65HVD230 as CAN controller, this one works with 3.3V. Right now I am creating a report for the whole project, I would be very happy to share it here after finishing. I think it is somehow a starter guide for people without any knowledge in this field (like me before).
I hope my new version of the board is finished in a week and then evaluated, I would be happy to share this information here as well.

Have a nice advent season :)

Best regards
Chris
Posts: 8
Joined: Fri Nov 23, 2012 11:27 am
by bertr2d2 » Sun Dec 09, 2012 10:29 pm
Hi Chris,
I also integrated the virtual can module. How can I set up it up? I tried
Code: Select all
ip link add dev vcan0 type vcan

this operation was not possible (I did it in this way on Ubuntu). Do I need to configure something different in ip link ? Why was it not necessary to do this operation for the can bus with the MCP2515 (I directly tried to set it up, like in the explanation)?


Probably you missed installing the module:
Code: Select all
root@raspberrypi ~ # modprobe vcan
root@raspberrypi ~ # ip link add dev vcan0 type vcan
root@raspberrypi ~ # ip -s -d link show vcan0
4: vcan0: <NOARP> mtu 16 qdisc noop state DOWN mode DEFAULT
    link/can
    vcan
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0     
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0     

Your idea sharing your experience setting up CAN on RPi is really a good thing.
I'm looking forward to see this on a web site.

Regards,

Gerd
Posts: 80
Joined: Wed Aug 08, 2012 10:12 pm
by muellie » Mon Dec 10, 2012 8:19 pm
Hey Gerd,

thank you very much for your quick answer.

That is weird, looks like you might be right. I just don't get it, I integrated the module in the kernel and loaded it, doublechecked with lsmod. Trying adding it I got the answer
Code: Select all
RTNETLINK answers: Operation not supported

I never had the idea to check without loading the module, same result.

However, unfortunatelly I do not have the RPi available at the moment. I'm gonna check at the weekend. By Friday I also do have my adapter board for testing.

The other question I answered by myself: vcan needs to be created via the "kernel netlink interface" - ip(8) tool. The can network device driver interface provides a generic interface to set up CAN network devices. By connecting the hardware there should automatically be a "device" under /dev/... (e.g. /dev/can0) which can be configured.

Thanks
Chris
Posts: 8
Joined: Fri Nov 23, 2012 11:27 am
by Zeta » Wed Dec 12, 2012 10:29 pm
Hello to all !

Thanks for your work on the MCP2515 controller.

I have made a small test board with the couple MCP2515/MCP2551, cross-compiled a kernel (from 3.6.y branch) with the standard MCP251x driver (configured at 20MHz), put all on a raspbian SDcard, and made some quick tests.
I have basically the same config as : http://lnxpps.de/rpie/

So it works !

After a few seconds @125kbps, (maybe 20% load?) , it suddently stopped to receive frames. Everything else was still working (I was connected on it though ssh), no error is outputed in /var/log/messages or dmesg, and there were still valid frames sent on the bus (no error on the other devices, and a led on the line between the 2515 & 2551 continued to blink).

I tried "ifconfig can0 down" & "ifconfig can0 up", while disconnecting from the CAN bus the device sending most of the frames. It worked some minutes (there were less than 10 frames per seconds), then I plugged again the device sending lot of frames, and it stopped again after 300 of them.

Below are some output of usefull commands at the third time it stopped (I restarted it 2 times before by ifconfig down/up) :
Code: Select all
*************************** MODULES    ********************************
Module                  Size  Used by
mcp251x                 7308  0
can_dev                 5436  1 mcp251x
spidev                  3908  0
spi_bcm2708             3696  0
can_bcm                 9388  0
can_raw                 4872  0
can                    18396  2 can_bcm,can_raw
snd_bcm2835             8860  0
snd_pcm                52768  1 snd_bcm2835
snd_page_alloc          2700  1 snd_pcm
snd_seq                36944  0
snd_seq_device          3608  1 snd_seq
snd_timer              14736  2 snd_pcm,snd_seq
snd                    35880  5 snd_bcm2835,snd_timer,snd_pcm,snd_seq,snd_seq_device
leds_gpio               1648  0
led_class               1788  1 leds_gpio
*************************** INTERRUPTS ********************************
           CPU0
  3:      27876   ARMCTRL  BCM2708 Timer Tick
 32:     506084   ARMCTRL  dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1
 52:       1661   ARMCTRL  BCM2708 GPIO catchall handler
 65:          2   ARMCTRL  ARM Mailbox IRQ
 66:          1   ARMCTRL  VCHIQ doorbell
 75:          1   ARMCTRL
 77:       6017   ARMCTRL  bcm2708_sdhci (dma)
 80:      10250   ARMCTRL  bcm2708_spi.0
 83:         20   ARMCTRL  uart-pl011
 84:      11492   ARMCTRL  mmc0
195:       1661      GPIO  mcp251x
FIQ:              usb_fiq
Err:          0
*************************** INTERFACE  ********************************
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state ERROR-ACTIVE restart-ms 0
    bitrate 125000 sample-point 0.875
    tq 500 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
    mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 10000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          1          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    7353       1689     3       0       3       0
    TX: bytes  packets  errors  dropped carrier collsns
    2          1        1       1       0       0

We can see 3 overruns. In fact, each time it stopped, the overrun counter was incremented. We can also see that the driver is still in "ERROR-ACTIVE" mode, meaning it doesn't see problems on the line.

So I assume that there is something wrong in my config that make something stop receiving after an overrun. This is strange, as some people had a lot of overrun during their tests, without stopping the system.

I read again this thread, and I have seen that people using the 3.6.y kernel branch used another flag, that I did not have:
Code: Select all
-         .irq_flags              = IRQF_TRIGGER_FALLING,
+         .irq_flags              = IRQF_TRIGGER_FALLING|IRQF_ONESHOT,

I don't know if this can be linked with my problem, as I don't have the errors in dmesg shown by the person who add the issue...

So next steps for me:
- try tommorow with the additionnal IRQF_ONESHOT flag
- continue to search for other clues...
- when problem solved, investigate around the asynchronous MCP2515 and SPI changes to avoid losing frames @125kbps.

Thanks for your help, I'll keep you informed of my results. As soon as it works fine for me, I will try to add a page to the wiki.
Posts: 72
Joined: Wed Dec 12, 2012 9:51 pm
by msperl » Thu Dec 13, 2012 7:27 am
Hi!

I have seen the behaviour of the mcp2515 driver getting "non responsive" only with the 3.6.y kernel, where I had to use the ONE_SHOT to get it working in the first place - the 3.6.y kernel will not let you do it otherwise.

This happens most often (but not always) in situations where a lot of packets get sent in a short time (say 3 packets sent in bulk one after the other at 500kHz).

What I have seen in such situations is that the INT line of the MCP2515 is low, but the driver does not do anything.

So my interpretation is that:
* with ONE_SHOT_MODE the IRQ handler only gets called once for a level interrupt (so it is essentially an edge interrupt with a "minor" advantage for the initial state)
* this works well in "low message rate" situations
* but when the driver is requesting one frame from the MCP2515, the situation may occur that right during that time a new packet may arrive.
* this is a case the mcp2515 driver does not expect (and check for), so the INT line stays low, but the driver makes the implicit assumption that it goes high - it does not check if new messages have arrived in between.
* with the 3.2 kernel without the need for ONE_SHOT_MODE the kernel was again calling the IRQ handler which woke up the driver again and it continued to work...
* so it is essentially a race condition in the driver that only broke with the use of ONE_SHOT_MODE.

Seems as if some modification to the driver is needed - or a different Interrupt mode that does allow repeated triggering of the irq...

As a workaround I only found that removing the mcp251x module (rmmod) and then reloading and reconfiguring it was resolving the issue for some time...

Martin

P.s: there is somewhere an experimental mcp2515 driver that may not exhibit this behavior - it may also be faster, as it is not running all the processing in a threaded interrupt...
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by Zeta » Thu Dec 13, 2012 8:11 pm
Hello Martin, and thanks for your answer. It helps me a lot !
msperl wrote:I have seen the behaviour of the mcp2515 driver getting "non responsive" only with the 3.6.y kernel, where I had to use the ONE_SHOT to get it working in the first place - the 3.6.y kernel will not let you do it otherwise.

I said a mistake yesterday. When looking at the code to change it, I already had the ONE_SHOT option.
However it's interesting to see it seems to be a problem with this kernel version, and so doesn't come from my setup.

msperl wrote:This happens most often (but not always) in situations where a lot of packets get sent in a short time (say 3 packets sent in bulk one after the other at 500kHz).

It seems to be the same conditions for me, except that I was running at only 125kpbs. However the device with which I made the test sends several frames successively, after its CAN task is executed. A lot of time it's ok, but after several hundreds of frames, there must be a bulk that is not processed at time.
You analysis makes sense. I will have to dig into that, expecially the effects of the ONE_SHOT flag.

msperl wrote:As a workaround I only found that removing the mcp251x module (rmmod) and then reloading and reconfiguring it was resolving the issue for some time...

I made it restart without unloading the module, only doing iconfig up/down seems enough to restart it correctly.

msperl wrote:P.s: there is somewhere an experimental mcp2515 driver that may not exhibit this behavior - it may also be faster, as it is not running all the processing in a threaded interrupt...

I assume you are talking of this one : http://clientes.netvisao.pt/anbadeol/mcp2515.html
I will try to compile now, and will make a test tommorow with it if I succeed.

I'll keep you informed. Thanks again !
Posts: 72
Joined: Wed Dec 12, 2012 9:51 pm
by msperl » Thu Dec 13, 2012 10:48 pm
for kernel 3.2 the one-shot is not necessary...
try without that one, in case you use the 3.2 kernel...
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by Zeta » Fri Dec 14, 2012 7:23 pm
Hello,

I had not a lot of time today, but tested quickly the "asynchronous MCP2515" driver (http://clientes.netvisao.pt/anbadeol/mcp2515.html), with the 3.6.y kernel.
I launched it, connect it to the same system as the other time, and seeing that it seems to work, I let it some time to see if it finished to hang.

It didn't hanged in 3 hours. I checked the bus load, and it was only 3 or 4%, but there are still the bunch of frames sent in a row that were blocking the system with the mcp251x driver.

Here are the relevant outputs :
*************************** UPTIME ********************************
03:50:07 up 3:11, 2 users, load average: 0.00, 0.04, 0.05
*************************** MODULES ********************************
Module Size Used by
mcp2515 4416 0
can_dev 5436 1 mcp2515
*************************** INTERRUPTS ********************************
52: 405311 ARMCTRL BCM2708 GPIO catchall handler
80: 2432130 ARMCTRL bcm2708_spi.0
195: 405311 GPIO can0
*************************** INTERFACE ********************************
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
link/can
can state STOPPED restart-ms 0
bitrate 125000 sample-point 0.875
tq 500 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
mcp2515: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
clock 10000000
re-started bus-errors arbit-lost error-warn error-pass bus-off
0 0 0 0 0 0
RX: bytes packets errors dropped overrun mcast
2650858 405316 0 0 5 0
TX: bytes packets errors dropped carrier collsns
0 0 0 0 0 0
So it worked correctly during 3 hours, receiving 400k frames (about 40 frames per seconds @ 125kbps).
5 overrun were logged, but the system didn't stopped.

Seeing these good results, I will try next week to make a real performance test (sending a thousand frames at around 100% bus load). Depending on the results, I may test also with the 3.2 kernel (I have already compiled both drivers for it).
Posts: 72
Joined: Wed Dec 12, 2012 9:51 pm
by bertr2d2 » Sat Dec 15, 2012 5:26 pm
Zeta,

do you use the modified SPI module from Martin ?:

viewtopic.php?f=44&t=19489

Regards

Gerd
Posts: 80
Joined: Wed Aug 08, 2012 10:12 pm
by Zeta » Sun Dec 16, 2012 1:11 am
bertr2d2 wrote:do you use the modified SPI module from Martin ?

hello gerd,
i'm not using it yet, but it is on my todo list. I'm trying to make it works step per step, so that it's easier to find where are the problems if any. So far, the toolchain, the 3.6.y kernel and the asynchronous mcp2515 driver seems ok. I will test it's limit, then add the patch for the spi.

Thanks for the tip anyway. I will present you my results here when i will have tried it.

Zeta
Posts: 72
Joined: Wed Dec 12, 2012 9:51 pm
by bertr2d2 » Sun Dec 16, 2012 1:38 am
Hi Zeta,

I'm quite sure that the modified SPI module will boost the CAN performance significantly.
Most lost CAN frames are caused by the original (slow) SPI module IMHO.

Regards

Gerd
Posts: 80
Joined: Wed Aug 08, 2012 10:12 pm
by msperl » Sun Dec 16, 2012 9:54 am
Hi Gerd!

The problem with the way that one_shot works means, that you only get one interrupt for a level IRQ not multiple while the IRQ line is low, what would be "normal" LEVEL_IRQ behaviour.

The driver essentially does the following:
* the mcp251x driver in the irq handler does check on the interrupt status register
* it does verify that there is no more event to handle (the IRQ line is also high at that time)
* if that is the case, it returns from the interrupt handler
* on top of that there is more "processing by the IRQ framework to be done.
* so only when all this is done the level IRQ is marked as handled and a new interrupt can occur.

Well this opens us up to a race condition:
* while doing "check-status" the IRQ is high and the status returns no flags for things to do
* starting from there - until you get to the point where the (threaded) interrupt framework has marked the irq as handled you have a window of opportunity, where the IRQ line can go down and no interrupt will get triggered.
* this means that the interrupt handler will wake up.

And as you can see we have seen such situations...

To solve the issue, we either
* either return to 3.2 kernel that does not have the requirement for IRQF_ONESHOT
* or get the 3.6 kernel to allow irqs to work without one-shot option (revert this specific patch)
* or modify the mcp251x driver to also handle this situation correctly (probably by moving away from the threaded IRQ and shifting to a separate worker-thread (with RT priority) and handling the edge IRQ synchronously)
* or using the alternative driver that does not do all the processing in the IRQ handler context but in the SPI context, which has no implications for the IRQ context...

Ciao,
Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Sun Dec 16, 2012 2:05 pm
Found out that this patch probably triggers all the issues:
http://git.kernel.org/?p=linux/kernel/g ... 2015927dc3

The question is now: how to solve the issue - seems as if changes to the mcp2515 driver are needed to register the interrupt "correctly"...
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by Zeta » Mon Dec 17, 2012 3:09 pm
Martin, Gerd,

Here are the results of my last tests, where I sent n*1000 frames at maximum speed (so bus load close to 100%), with CPU idle (nothing more than SSH and candump) :

- Kernel 3.6.y, Asynchronous MCP2515 driver, standard SPI driver
* @125kbps : no frame lost, no overrun
* @500kbps : only 2342 frame received out of 3000 (2355 the second time)
Code: Select all
5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state STOPPED restart-ms 0
    bitrate 500000 sample-point 0.850
    tq 100 prop-seg 8 phase-seg1 8 phase-seg2 3 sjw 1
    mcp2515: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 10000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    18736      2342     0       0       383     0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0


- Kernel 3.6.y, Asynchronous MCP2515 driver, low_latency SPI driver (thanks to Martin's patch for 3.6.y)
* @500kbps : no frame lost, no overrun, after a batch of 1000, 3000 and 5000 frames (total 9000)
Code: Select all
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state STOPPED restart-ms 0
    bitrate 500000 sample-point 0.850
    tq 100 prop-seg 8 phase-seg1 8 phase-seg2 3 sjw 1
    mcp2515: tseg1 2..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 10000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0
    RX: bytes  packets  errors  dropped overrun mcast
    72000      9000     0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0


msperl wrote:Found out that this patch probably triggers all the issues:
http://git.kernel.org/?p=linux/kernel/g ... 2015927dc3

The question is now: how to solve the issue - seems as if changes to the mcp2515 driver are needed to register the interrupt "correctly"...

Good find. Now, it seems to work OK with the low latency SPI and the asynchronous MCP2515 driver (at least at 500kbps, didn't tried at 1Mbps), so I don't know if this is still required ?

What is the config you are currently using to compare ?
* Kernel 3.2 or 3.6.y ?
* MCP251x or asynchronous MCP2515 driver ?
* standard SPI or low latency SPI ?
* MCP2515 quartz frequency ? (I use 20MHz)
* SPI bus frequency ? (I use 10MHz)

Thanks,
Posts: 72
Joined: Wed Dec 12, 2012 9:51 pm
by msperl » Mon Dec 17, 2012 4:05 pm
my config:
* 3.6.y Kernel
* spi-latency patch
* mcp251x
* 16MHz CAN
* 10MHz SPI configured (results in typically 7.8MHz (=250MHz/(2^5) - when not overclocking the RPI)

I did not experience the issue with 3.2 kernel (with spi-latency patch) and mcp251x either.

In the 3.2 kernel the mcp2515 driver did not give any real advantage, so I stopped using it and went for the in tree version.
But it seems as if it does work better in the 3.6.y case...

Ciao,
Martin

P.s: I hopefully may be able to get my rig back up for producing 100% Buss usage and pounding the RPI sometime this week - then I will look at the mcp2515 driver to see how to improve it and get it merged back into the kernel...
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by msperl » Mon Dec 17, 2012 6:23 pm
Removing this one patch in 3.6.y does not resolve the issue.
There seems to be a bigger issue with other changes to the interrupt infrastructure...
This will possibly mean a modification of the mcp251x driver is really needed...

Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by Zeta » Mon Dec 17, 2012 10:50 pm
msperl wrote:Removing this one patch in 3.6.y does not resolve the issue.
There seems to be a bigger issue with other changes to the interrupt infrastructure...
This will possibly mean a modification of the mcp251x driver is really needed...

Martin

I don't have your experience when playing with the drivers in Linux. I'm still at the level of configuring and compiling them, but I'm used to barebone systems like micro-controllers, so have knowledge of how peripherals works.
With Christmas coming on, I will not have a lot of time, but it is something I would like to dig a bit deeper back in January, to understand a bit more the internals of Linux driver model. It is really interesting.

Also if you need some help, for testing a patch or anything else, I will do my best to help you.

Thanks again for all the work you have done so far !

Zeta
Posts: 72
Joined: Wed Dec 12, 2012 9:51 pm
by msperl » Wed Dec 19, 2012 10:52 pm
Well - I did another set of tests and right now the alternative mcp2515 driver is able to handle receiving 12700 CAN messages/s (standard frame and 0 bytes of data at 500kbit).

OK - I see a few frame errors (<4/s), but I believe it may be acceptable at that data rate...

But I have to report that at that at that rate the RPI becomes a less responsive than it usually is - it is handling about 32000 interrupts/s at that rate and with <10% idle (and almost 90% in System)...

If I transmit instead 8 bytes with extended IDs then my RPI can receive 3480 messages/second without any framing errors and about 17600 interrupts/second and 60-70% idle (30% System)
At that rate the RPI is much more responsive...

So for some of you it may be worth looking into that alternative driver...
I still will investigate how we could improve the existing driver and to get a patched version upstream
(I have already tried to contact the last Author of the MCP251x driver to get his feedback)

Ciao,
Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by Zeta » Wed Dec 19, 2012 11:36 pm
Hello Martin,

You have interesting results here. It makes me think I did not precise that my results were with standard CAN id (11 bits) and message length of 8 bytes of data.

About the driver itself, and making it reach the kernel, you may be interested by the following message (and the whole thread) on the socket can mailing list archive, from the writer of the MCP2515 driver (Andre B. Oliveira) in 2010 :
http://www.mail-archive.com/socketcan-c ... 01694.html

He explains there that replacing the MCP251x was not is goal, as his driver cannot handle the MCP2510 by design. There are some other intersting remarks in the full thread. I did not found however if the discussion continued after this thread about this driver.

One thing that I found out using this MCP2515 driver, is that in the ip command, it returns that the component is "STOPPED", despites it is running :
Code: Select all
5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state STOPPED restart-ms 0
 ...

Whereas the MCP251x was displaying the correct state of the component :
Code: Select all
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state ERROR-ACTIVE restart-ms 0

Or "ERROR-PASSIVE" if there is something wrong on the bus, as expected.

You can find on this message :
http://www.mail-archive.com/socketcan-c ... 01902.html
> The driver still lacks CAN error handling, though.
Ok, I'll try to implement that.

So it seems related. I don't know if something has been done on this subject since then, and if yes where is the up to date source code ?

Regards.
Posts: 72
Joined: Wed Dec 12, 2012 9:51 pm
by msperl » Thu Dec 20, 2012 7:27 am
Hi Zeta!

Now that I (again) can replicate the issue I hopefully will find the time to patch the mcp251x driver - to make it easier for inclusion with upstream...

The way that I think of it, I may also update the DMA part of the bcm2708-spi driver to make best use of the system while minimizing the number of Interrupts - with DMA and asyncronous SPI we can do several transfers with just one interrupt. Theoretically we should be able to handle up to 3 Transmits and 2 Receives with 2 interrupts and 4 process wakeups, which is better than the mcp2515 driver.

But all that still does not reduce the system load by more than 40%.

So if one really needs high number of packets/s and still have the RPI mostly idle, then one will probably need to move to a different controller or to use a AVR/PIC as a buffer system in front of the mcp2515 to aggregate messages from the CAN-bus and just send a interrupts after some time or when the buffers start to exceed a threshold.

But on the other side this may not really improve the CPU utilization on the RPI, as there is still per packet overhead on the can-network stack, which may be consuming most of the cycles...

But lets not get ahead of us and start by optimizing the direct mcp251x driver right now and then take further steps if we still see some need and the potential...
(Maybe over Christmas...)

Ciao, Martin
Posts: 234
Joined: Thu Sep 20, 2012 3:40 pm
by muellie » Fri Dec 21, 2012 11:09 pm
Hello all,

in the last two weeks I tried to set up a can with MCP2515 on my board , so far I have been partly successfull. BTW, vcan is running. I did everything like initialy, don't have a glue what I did wrong before.
Right now I am a confused about different things, I am quite sure not everything will sound usually for you either.

I am using 2012-10-28-wheezy-raspbian, kernel 3.2.27, bcm2708 patch as posted above by me but without oneshot (Gerd's Patch, modified for newer Raspbian version). I installed some more can modules during kernel build, shouldn't effect anything. I do have two versions for the hardware, both are attached. During troubleshooting I modified version 2, to make sure there is an automated full reset during powering. I put an resistor at the reset pin (10k) as well as a capacity (100nF) to reset and ground, like shown in the data sheet.

Following for version 2 HW:
I tried several times to set up a can, I always got following:
kernel buffer (searched for mcp only, directly after booting, no modules loaded)
Code: Select all
[    0.075801]  BCM2708 mcp251x_init:  got IRQ 195 for MCP2515
[   11.464291] mcp251x spi0.0: MCP251x didn't enter in conf mode after reset
[   11.481056] mcp251x spi0.0: CANSTAT 0x00 CANCTRL 0x00
[   11.481084] mcp251x spi0.0: Probe failed
[   11.494615] mcp251x spi0.0: probe failed

Today I tried to get the CAN running with a beamer on RCA video insted of using HDMI output. And it seemed like something was working. Unfortunatelly I am not able to reproduce it (do not have this equipment at home), but I got following without connecting any screen at all:
Code: Select all
[    0.075745]  BCM2708 mcp251x_init:  got IRQ 195 for MCP2515
[    5.008309] bcm2708_spi bcm2708_spi.0: SPI Controller at 0x20204000 (irq 80)
[    7.054187] mcp251x spi0.0: CANSTAT 0x06 CANCTRL 0x01
[    7.054211] mcp251x spi0.0: Probe failed
[    7.060858] mcp251x spi0.0: probe failed
[  100.920990] can: controller area network core (rev 20090105 abi 8)
[  107.083085] can: raw protocol (rev 20090105)
[  109.584950] can: broadcast manager protocol (rev 20090105 t)

CANCTRL (st2) is not same every time. I saw in the driver for mcp (mcp251x) how st1 and st2 is defined as well as why the first "Probe failed" is shown. Below the part of the driver:
Code: Select all
// Inserted by me: In line 626 to 645
static int mcp251x_hw_probe(struct spi_device *spi)
{
   int st1, st2;

   mcp251x_hw_reset(spi);

   /*
    * Please note that these are "magic values" based on after
    * reset defaults taken from data sheet which allows us to see
    * if we really have a chip on the bus (we avoid common all
    * zeroes or all ones situations)
    */
   st1 = mcp251x_read_reg(spi, CANSTAT) & 0xEE;
   st2 = mcp251x_read_reg(spi, CANCTRL) & 0x17;

   dev_dbg(&spi->dev, "CANSTAT 0x%02x CANCTRL 0x%02x\n", st1, st2);

   /* Check for power up default values */
   return (st1 == 0x80 && st2 == 0x07) ? 1 : 0;
}



// Inserted by me: In line 1019 to 1022
   if (!mcp251x_hw_probe(spi)) {
      dev_info(&spi->dev, "Probe failed\n");
      goto error_probe;
   }
   

// Inserted by me: In line 1028 to 1032   
   ret = register_candev(net);
   if (!ret) {
      dev_info(&spi->dev, "probed\n");
      return ret;
   }   


So, my question is: Why is the HDMI reflecting on the reset state of the mcp? First I thought it is caused by the runtime for initialisation of the HDMI output, but I do not think so. Fact: The reset is called later (around 3s) in case there is an HDMI monitor plugged in. Did anybody modify something in the dirver itself? Was thinking about extending the waiting time (mdelay(10), line 614), but I do not think that is the way to go.

And no the weird part: HW version1. I tried this one longer ago, never worked (hence I build version 2). Today I thought I give it a shot because the screen thing. By accident I mixed up plus and ground of the adapter board and damaged it. For sure the transceiver is killed, do not know more about the damage so far. Afterward I tried it anyway, and it worked. I really do not get it, also with a HDMI screen pluged in. And in case I DO NOT POWER this board (no 3V3 or ground connected) I get the same messages as if it would be powered (no failures in kernel buffer, correct st1 and st2, etc.), and I am able to set up can0. Afterward I tried again the other board (V2) or without anything conncted, still get the message "MCP251x didn't enter in conf mode after reset".
Board version1, not powered and powered:
kernel buffer (mcp,spi,can):
Code: Select all
[    0.075819]  BCM2708 mcp251x_init:  got IRQ 195 for MCP2515
[    9.745031] bcm2708_spi bcm2708_spi.0: SPI Controller at 0x20204000 (irq 80)
[   11.854439] CAN device driver interface
[   11.905886] mcp251x spi0.0: CANSTAT 0x80 CANCTRL 0x07
[   11.907823] mcp251x spi0.0: probed
[   27.616026] mcp251x spi0.0: bit-timing not yet defined
[   27.616058] mcp251x spi0.0: unable to set initial baudrate!
[   27.616436] mcp251x spi0.0: bit-timing not yet defined
[   27.616460] mcp251x spi0.0: unable to set initial baudrate!
[   27.616506] mcp251x spi0.0: bit-timing not yet defined
[   27.616521] mcp251x spi0.0: unable to set initial baudrate!
[   27.616566] mcp251x spi0.0: bit-timing not yet defined
[   27.616582] mcp251x spi0.0: unable to set initial baudrate!
[   27.616623] mcp251x spi0.0: bit-timing not yet defined
[   27.616638] mcp251x spi0.0: unable to set initial baudrate!
[   28.718680] mcp251x spi0.0: bit-timing not yet defined
[   28.718712] mcp251x spi0.0: unable to set initial baudrate!
[   29.719939] mcp251x spi0.0: bit-timing not yet defined
[   29.719971] mcp251x spi0.0: unable to set initial baudrate!
[   30.721197] mcp251x spi0.0: bit-timing not yet defined
[   30.721223] mcp251x spi0.0: unable to set initial baudrate!
[   31.722415] mcp251x spi0.0: bit-timing not yet defined
[   31.722447] mcp251x spi0.0: unable to set initial baudrate!
[   32.723646] mcp251x spi0.0: bit-timing not yet defined
[   32.723678] mcp251x spi0.0: unable to set initial baudrate!
[   33.724898] mcp251x spi0.0: bit-timing not yet defined
[   33.724931] mcp251x spi0.0: unable to set initial baudrate!
[   34.726123] mcp251x spi0.0: bit-timing not yet defined
[   34.726154] mcp251x spi0.0: unable to set initial baudrate!
[   35.727346] mcp251x spi0.0: bit-timing not yet defined
[   35.727377] mcp251x spi0.0: unable to set initial baudrate!
[   36.728570] mcp251x spi0.0: bit-timing not yet defined
[   36.728605] mcp251x spi0.0: unable to set initial baudrate!
[   37.729792] mcp251x spi0.0: bit-timing not yet defined
[   37.729822] mcp251x spi0.0: unable to set initial baudrate!
[   38.731044] mcp251x spi0.0: bit-timing not yet defined
[   38.731078] mcp251x spi0.0: unable to set initial baudrate!
[   39.732353] mcp251x spi0.0: bit-timing not yet defined
[   39.732381] mcp251x spi0.0: unable to set initial baudrate!
[   40.733601] mcp251x spi0.0: bit-timing not yet defined
[   40.733631] mcp251x spi0.0: unable to set initial baudrate!
[   41.734859] mcp251x spi0.0: bit-timing not yet defined
[   41.734889] mcp251x spi0.0: unable to set initial baudrate!
[   42.736078] mcp251x spi0.0: bit-timing not yet defined
[   42.736109] mcp251x spi0.0: unable to set initial baudrate!
[   43.737294] mcp251x spi0.0: bit-timing not yet defined
[   43.737321] mcp251x spi0.0: unable to set initial baudrate!
[   44.738501] mcp251x spi0.0: bit-timing not yet defined
[   44.738530] mcp251x spi0.0: unable to set initial baudrate!
[   45.739708] mcp251x spi0.0: bit-timing not yet defined
[   45.739735] mcp251x spi0.0: unable to set initial baudrate!
[   46.740912] mcp251x spi0.0: bit-timing not yet defined
[   46.740940] mcp251x spi0.0: unable to set initial baudrate!
[   47.742123] mcp251x spi0.0: bit-timing not yet defined
[   47.742157] mcp251x spi0.0: unable to set initial baudrate!
[   48.743417] mcp251x spi0.0: bit-timing not yet defined
[   48.743448] mcp251x spi0.0: unable to set initial baudrate!
[   49.744643] mcp251x spi0.0: bit-timing not yet defined
[   49.744675] mcp251x spi0.0: unable to set initial baudrate!
[   50.745894] mcp251x spi0.0: bit-timing not yet defined
[   50.745927] mcp251x spi0.0: unable to set initial baudrate!
[   51.747106] mcp251x spi0.0: bit-timing not yet defined
[   51.747137] mcp251x spi0.0: unable to set initial baudrate!
[   52.748316] mcp251x spi0.0: bit-timing not yet defined
[   52.748347] mcp251x spi0.0: unable to set initial baudrate!
[   53.749538] mcp251x spi0.0: bit-timing not yet defined
[   53.749565] mcp251x spi0.0: unable to set initial baudrate!
[   54.750747] mcp251x spi0.0: bit-timing not yet defined
[   54.750778] mcp251x spi0.0: unable to set initial baudrate!
[   55.751945] mcp251x spi0.0: bit-timing not yet defined
[   55.751976] mcp251x spi0.0: unable to set initial baudrate!
[   56.753144] mcp251x spi0.0: bit-timing not yet defined
[   56.753173] mcp251x spi0.0: unable to set initial baudrate!
[   57.754351] mcp251x spi0.0: bit-timing not yet defined
[   57.754378] mcp251x spi0.0: unable to set initial baudrate!
[   58.755558] mcp251x spi0.0: bit-timing not yet defined
[   58.755586] mcp251x spi0.0: unable to set initial baudrate!
[   59.756778] mcp251x spi0.0: bit-timing not yet defined
[   59.756807] mcp251x spi0.0: unable to set initial baudrate!
[   60.757987] mcp251x spi0.0: bit-timing not yet defined
[   60.758019] mcp251x spi0.0: unable to set initial baudrate!
[   61.759188] mcp251x spi0.0: bit-timing not yet defined
[   61.759215] mcp251x spi0.0: unable to set initial baudrate!
[   62.760428] mcp251x spi0.0: bit-timing not yet defined
[   62.760457] mcp251x spi0.0: unable to set initial baudrate!
[   63.761645] mcp251x spi0.0: bit-timing not yet defined
[   63.761675] mcp251x spi0.0: unable to set initial baudrate!
[   64.762877] mcp251x spi0.0: bit-timing not yet defined
[   64.762908] mcp251x spi0.0: unable to set initial baudrate!
[   65.764091] mcp251x spi0.0: bit-timing not yet defined
[   65.764119] mcp251x spi0.0: unable to set initial baudrate!
[   66.765305] mcp251x spi0.0: bit-timing not yet defined
[   66.765333] mcp251x spi0.0: unable to set initial baudrate!
[   66.904472] can: controller area network core (rev 20090105 abi 8)
[   67.766511] mcp251x spi0.0: bit-timing not yet defined
[   67.766541] mcp251x spi0.0: unable to set initial baudrate!
[   68.767728] mcp251x spi0.0: bit-timing not yet defined
[   68.767791] mcp251x spi0.0: unable to set initial baudrate!
[   69.768998] mcp251x spi0.0: bit-timing not yet defined
[   69.769028] mcp251x spi0.0: unable to set initial baudrate!
[   70.770212] mcp251x spi0.0: bit-timing not yet defined
[   70.770246] mcp251x spi0.0: unable to set initial baudrate!
[   71.771471] mcp251x spi0.0: bit-timing not yet defined
[   71.771500] mcp251x spi0.0: unable to set initial baudrate!
[   71.979901] can: raw protocol (rev 20090105)
[   72.772687] mcp251x spi0.0: bit-timing not yet defined
[   72.772715] mcp251x spi0.0: unable to set initial baudrate!
[   73.773898] mcp251x spi0.0: bit-timing not yet defined
[   73.773925] mcp251x spi0.0: unable to set initial baudrate!
[   74.159007] can: broadcast manager protocol (rev 20090105 t)
[   74.775119] mcp251x spi0.0: bit-timing not yet defined
[   74.775149] mcp251x spi0.0: unable to set initial baudrate!
[   75.776339] mcp251x spi0.0: bit-timing not yet defined
[   75.776366] mcp251x spi0.0: unable to set initial baudrate!
[   76.777548] mcp251x spi0.0: bit-timing not yet defined
[   76.777576] mcp251x spi0.0: unable to set initial baudrate!
[   77.778767] mcp251x spi0.0: bit-timing not yet defined
[   77.778796] mcp251x spi0.0: unable to set initial baudrate!
[   78.779976] mcp251x spi0.0: bit-timing not yet defined
[   78.780004] mcp251x spi0.0: unable to set initial baudrate!
[   79.781193] mcp251x spi0.0: bit-timing not yet defined
[   79.781221] mcp251x spi0.0: unable to set initial baudrate!
[   80.782393] mcp251x spi0.0: bit-timing not yet defined
[   80.782424] mcp251x spi0.0: unable to set initial baudrate!
[   81.783606] mcp251x spi0.0: bit-timing not yet defined
[   81.783633] mcp251x spi0.0: unable to set initial baudrate!
[   82.784805] mcp251x spi0.0: bit-timing not yet defined
[   82.784831] mcp251x spi0.0: unable to set initial baudrate!
[   83.786020] mcp251x spi0.0: bit-timing not yet defined
[   83.786047] mcp251x spi0.0: unable to set initial baudrate!
[   84.787243] mcp251x spi0.0: bit-timing not yet defined
[   84.787271] mcp251x spi0.0: unable to set initial baudrate!
[   85.788444] mcp251x spi0.0: bit-timing not yet defined
[   85.788475] mcp251x spi0.0: unable to set initial baudrate!
[   86.789651] mcp251x spi0.0: bit-timing not yet defined
[   86.789679] mcp251x spi0.0: unable to set initial baudrate!
[   87.790850] mcp251x spi0.0: bit-timing not yet defined
[   87.790880] mcp251x spi0.0: unable to set initial baudrate!
[   88.792057] mcp251x spi0.0: bit-timing not yet defined
[   88.792085] mcp251x spi0.0: unable to set initial baudrate!
[   89.793285] mcp251x spi0.0: bit-timing not yet defined
[   89.793315] mcp251x spi0.0: unable to set initial baudrate!
[   90.025098] mcp251x spi0.0: bit-timing not yet defined
[   90.025131] mcp251x spi0.0: unable to set initial baudrate!
[   90.794515] mcp251x spi0.0: bit-timing not yet defined
[   90.794544] mcp251x spi0.0: unable to set initial baudrate!
[   91.795732] mcp251x spi0.0: bit-timing not yet defined
[   91.795761] mcp251x spi0.0: unable to set initial baudrate!
[   92.796937] mcp251x spi0.0: bit-timing not yet defined
[   92.797001] mcp251x spi0.0: unable to set initial baudrate!
[   93.798180] mcp251x spi0.0: bit-timing not yet defined
[   93.798208] mcp251x spi0.0: unable to set initial baudrate!
[   94.799392] mcp251x spi0.0: bit-timing not yet defined
[   94.799420] mcp251x spi0.0: unable to set initial baudrate!
[   95.800609] mcp251x spi0.0: bit-timing not yet defined
[   95.800638] mcp251x spi0.0: unable to set initial baudrate!
[   96.283218] mcp251x spi0.0: CNF: 0x00 0xb5 0x01

Interrupt protocol:
Code: Select all
           CPU0       
  3:       5351   ARMCTRL  BCM2708 Timer Tick
 32:     125974   ARMCTRL  dwc_otg, dwc_otg_pcd, dwc_otg_hcd:usb1
 52:          0   ARMCTRL  BCM2708 GPIO catchall handler
 65:        309   ARMCTRL  ARM Mailbox IRQ
 66:          1   ARMCTRL  VCHIQ doorbell
 75:          1   ARMCTRL
 77:       6871   ARMCTRL  bcm2708_sdhci (dma)
 80:         36   ARMCTRL  bcm2708_spi.0
 83:         20   ARMCTRL  uart-pl011
 84:       9197   ARMCTRL  mmc0
195:          0      GPIO  mcp251x
FIQ:              usb_fiq
Err:          0

show CAN:
Code: Select all
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT qlen 10
    link/can
    can state ERROR-ACTIVE restart-ms 0
    bitrate 500000 sample-point 0.875
    tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
    mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp-inc 1
    clock 8000000
    re-started bus-errors arbit-lost error-warn error-pass bus-off
    0          0          0          0          0          0         
    RX: bytes  packets  errors  dropped overrun mcast   
    0          0        0       0       0       0     
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0     


I really do appreciate any help very much. Right now I am reall confused. First I thought maybe the bulk capacitor is to big, but it is partly working without a HDMI. Version 1 is damaged and everything seems to be working fine (except the CAN itself (ERROR-Active)) on MCP2515 side.

I wish you all nice Christmas holidays! Thank you very much for the help in advance.

Take care
Chris
Attachments
SchematicCAN.png
Schematic Version 1
SchematicCAN.png (14.91 KiB) Viewed 1682 times
Schematic.png
Schematic Version 2
Schematic.png (61.14 KiB) Viewed 1682 times
Posts: 8
Joined: Fri Nov 23, 2012 11:27 am