USB/serial converters (not very) broken on Pi


163 posts   Page 1 of 7   1, 2, 3, 4, 5 ... 7
by M33P » Sun Sep 02, 2012 1:48 pm
Hello

After several days toying with my Pi, I have come up against a very broken result when attempting to use a USB/Serial converter (PL2303 or FTDI FT232R).

The PL2303 is part of a GPS receiver, the FT232R is part of the Bus Pirate v3 http://dangerousprototypes.com/docs/Bus_Pirate

Current setup
Hardware:
Powered by 2 off 5V 1A USB charging adapters, spliced cable to power remote hub via coax jack, RPI via micro-usb. Voltages both healthy and verified stable
Sandisk 4GB SDHC x4 sd card
7-port powered hub - modified for extra power capacitor decoupling and added blocking diode to prevent power to downstream ports backfeeding the Pi

LSUSB:
Code: Select all
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp.
Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp.
Bus 001 Device 004: ID 1a40:0201 Terminus Technology Inc. FE 2.1 7-port Hub
Bus 001 Device 005: ID 046d:c283 Logitech, Inc. WingMan Force 3D
Bus 001 Device 006: ID 067b:2303 Prolific Technology, Inc. PL2303 Serial Port
Bus 001 Device 007: ID 05dc:a761 Lexar Media, Inc.
Bus 001 Device 008: ID 0ace:1215 ZyDAS ZD1211B 802.11g


Software (disk image is Wheezy):
Code: Select all
pi@raspberrypi:~$ uname -a && /opt/vc/bin/vcgencmd version
Linux raspberrypi 3.2.27+ #102 PREEMPT Sat Sep 1 01:00:50 BST 2012 armv6l GNU/Linux
Aug 25 2012 01:12:00
Copyright (c) 2012 Broadcom
version 333349 (release)


cmdline.txt:
Code: Select all
pi@raspberrypi:~$ cat /boot/cmdline.txt
dwc_otg.lpm_enable=0 dwc_otg.fiq_fix_enable=1 sdhci-bcm2708.enable_llm=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait


I can reliably break the entire USB system on the Pi by either
- Connecting the FT232R USB/serial device and doing anything with it - a couple of keypresses
- Connecting the PL2303 device and letting gpsmon run for several hours

After this the USB will break in the same way in both cases. To allow debugging I connect directly to the GPIO UART which has a console attached to ttyAMA0.

There is nothing in dmesg regarding the breakage, only a flood of subsequent errors from the eth0 device as it cannot talk to the SMSC ethernet controller.

Code: Select all
Sep  1 22:14:32 raspberrypi kernel: [25613.883345] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Sep  1 22:14:32 raspberrypi kernel: [25613.883377] smsc95xx 1-1.1:1.0: eth0: MII is busy in smsc95xx_mdio_read
Sep  1 22:14:37 raspberrypi ifplugd(eth0)[1009]: Link beat lost.
Sep  1 22:14:37 raspberrypi kernel: [25618.883492] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Sep  1 22:14:37 raspberrypi kernel: [25618.883524] smsc95xx 1-1.1:1.0: eth0: MII is busy in smsc95xx_mdio_read
Sep  1 22:14:43 raspberrypi kernel: [25624.983658] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Sep  1 22:14:48 raspberrypi kernel: [25629.983774] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
Sep  1 22:14:53 raspberrypi kernel: [25634.983905] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Sep  1 22:14:58 raspberrypi kernel: [25639.984048] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000118
Sep  1 22:15:03 raspberrypi kernel: [25644.984187] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Sep  1 22:15:08 raspberrypi kernel: [25649.984307] smsc95xx 1-1.1:1.0: eth0: Failed to write register index 0x00000114
Sep  1 22:15:13 raspberrypi kernel: [25654.984432] smsc95xx 1-1.1:1.0: eth0: Failed to read register index 0x00000114
Sep  1 22:15:18 raspberrypi ifplugd(eth0)[1009]: Executing '/etc/ifplugd/ifplugd.action eth0 down'.

The errors then repeat in a similar vein.

As a naive test, I attempted USB removal via sysfs and proceeded to OOPS the kernel (note that via the GPIO UART, the kernel debugger starts up and overrides the usual dmesg output)
Code: Select all
root@raspberrypi:/sys/bus/usb/devices/1-1# echo 1 > remove
[68446.534722] Unable to handle kernel NULL pointer dereference at virtual address 0000006c
[68446.552515] pgd = c95a8000
[68446.564976] [0000006c] *pgd=08061831, *pte=00000000, *ppte=00000000
[68446.581226] Internal error: Oops: 17 [#1] PREEMPT

Entering kdb (current=0xc9754f20, pid 2643) Oops: (null)
due to oops @ 0xc02504b0

Pid: 2643, comm:                 bash
CPU: 0    Not tainted  (3.2.27+ #102)
PC is at smsc95xx_async_cmd_callback+0x30/0x44
LR is at usb_hcd_giveback_urb+0x60/0xf4
pc : [<c02504b0>]    lr : [<c025ae08>]    psr: a0000093
sp : c94fbd60  ip : c00198a0  fp : 00f122c8
r10: 00000000  r9 : cb914470  r8 : ffffff94
r7 : cb95b180  r6 : ffffff94  r5 : c9544300  r4 : cbbf8bc0
r3 : 00000000  r2 : ffffff94  r1 : 00000000  r0 : cbbf8bc0
Flags: NzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment user
Control: 00c5387d  Table: 095a8008  DAC: 00000015
[<c0013f30>] (unwind_backtrace+0x0/0xf0) from [<c037460c>] (kdb_dumpregs+0x28/0x50)
[<c037460c>] (kdb_dumpregs+0x28/0x50) from [<c0071d60>] (kdb_main_loop+0x504/0x758)
[<c0071d60>] (kdb_main_loop+0x504/0x758) from [<c0074344>] (kdb_stub+0x13c/0x368)
[<c0074344>] (kdb_stub+0x13c/0x368) from [<c006b330>] (kgdb_handle_exception+0x1fc/0x69c)
[<c006b330>] (kgdb_handle_exception+0x1fc/0x69c) from [<c00136b0>] (kgdb_notify+0x24/0x40)
[<c00136b0>] (kgdb_notify+0x24/0x40) from [<c037b200>] (notifier_call_chain+0x44/0x84)
[<c037b200>] (notifier_call_chain+0x44/0x84) from [<c037b278>] (__atomic_notifier_call_chain+0x38/0x4c)
[<c037b278>] (__atomic_notifier_call_chain+0x38/0x4c) from [<c037b2a4>] (atomic_notifier_call_chain+0x18/0x20)
[<c037b2a4>] (atomic_notifier_call_chain+0x18/0x20) from [<c037b2e4>] (notify_die+0x38/0x44)
[<c037b2e4>] (notify_die+0x38/0x44) from [<c0011578>] (die+0xb0/0x37c)
[<c0011578>] (die+0xb0/0x37c) from [<c0373f30>] (__do_kernel_fault.part.3+0x54/0x74)
[<c0373f30>] (__do_kernel_fault.part.3+0x54/0x74) from [<c037af4c>] (do_page_fault+0x1a4/0x36c)
[<c037af4c>] (do_page_fault+0x1a4/0x36c) from [<c0008388>] (do_DataAbort+0x34/0x98)
[<c0008388>] (do_DataAbort+0x34/0x98) from [<c0379798>] (__dabt_svc+0x38/0x60)
Exception stack(0xc94fbd18 to 0xc94fbd60)
bd00:                                                       cbbf8bc0 00000000
bd20: ffffff94 00000000 cbbf8bc0 c9544300 ffffff94 cb95b180 ffffff94 cb914470
bd40: 00000000 00f122c8 c00198a0 c94fbd60 c025ae08 c02504b0 a0000093 ffffffff
[<c0379798>] (__dabt_svc+0x38/0x60) from [<c02504b0>] (smsc95xx_async_cmd_callback+0x30/0x44)
[<c02504b0>] (smsc95xx_async_cmd_callback+0x30/0x44) from [<c025ae08>] (usb_hcd_giveback_urb+0x60/0xf4)
[<c025ae08>] (usb_hcd_giveback_urb+0x60/0xf4) from [<c0280a78>] (dwc_otg_urb_dequeue+0x88/0xa4)
[<c0280a78>] (dwc_otg_urb_dequeue+0x88/0xa4) from [<c025bb98>] (unlink1+0x2c/0x10c)
[<c025bb98>] (unlink1+0x2c/0x10c) from [<c025c5c8>] (usb_hcd_flush_endpoint+0x128/0x158)
[<c025c5c8>] (usb_hcd_flush_endpoint+0x128/0x158) from [<c025ea7c>] (usb_disable_endpoint+0x4c/0x80)
[<c025ea7c>] (usb_disable_endpoint+0x4c/0x80) from [<c025ec8c>] (usb_disable_device+0x188/0x1b8)
[<c025ec8c>] (usb_disable_device+0x188/0x1b8) from [<c02580ec>] (usb_disconnect+0x70/0x118)
[<c02580ec>] (usb_disconnect+0x70/0x118) from [<c02581f0>] (hub_quiesce+0x5c/0xa0)
[<c02581f0>] (hub_quiesce+0x5c/0xa0) from [<c0258380>] (hub_disconnect+0x8c/0x104)
[<c0258380>] (hub_disconnect+0x8c/0x104) from [<c02602f8>] (usb_unbind_interface+0x3c/0xe4)
[<c02602f8>] (usb_unbind_interface+0x3c/0xe4) from [<c02270a8>] (__device_release_driver+0x58/0xa8)
[<c02270a8>] (__device_release_driver+0x58/0xa8) from [<c0227114>] (device_release_driver+0x1c/0x28)
[<c0227114>] (device_release_driver+0x1c/0x28) from [<c0226ca4>] (bus_remove_device+0x68/0x88)

[<c0226ca4>] (bus_remove_device+0x68/0x88) from [<c0224dd4>] (device_del+0xf8/0x174)
[<c0224dd4>] (device_del+0xf8/0x174) from [<c025eba4>] (usb_disable_device+0xa0/0x1b8)
[<c025eba4>] (usb_disable_device+0xa0/0x1b8) from [<c025f330>] (usb_set_configuration+0x170/0x83c)
[<c025f330>] (usb_set_configuration+0x170/0x83c) from [<c026312c>] (usb_remove_store+0x48/0x5c)
[<c026312c>] (usb_remove_store+0x48/0x5c) from [<c0223df0>] (dev_attr_store+0x18/0x24)
[<c0223df0>] (dev_attr_store+0x18/0x24) from [<c0115d44>] (sysfs_write_file+0x168/0x19c)
[<c0115d44>] (sysfs_write_file+0x168/0x19c) from [<c00b92a8>] (vfs_write+0x9c/0x178)
[<c00b92a8>] (vfs_write+0x9c/0x178) from [<c00b9598>] (sys_write+0x38/0x70)
[<c00b9598>] (sys_write+0x38/0x70) from [<c000da20>] (ret_fast_syscall+0x0/0x30)

kdb>

The same command does not result in an OOPS when the USB is working.

The only way I can restore USB function is to power cycle the board.

Suggestions on how I can use a USB/serial converter reliably?

Edit: title updated due to events later on the thread
Last edited by M33P on Sat Mar 09, 2013 6:00 pm, edited 2 times in total.
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm
by drgeoff » Sun Sep 02, 2012 2:42 pm
How long did you spend searching this forum before posting?

There are many posts on this topic.

eg viewtopic.php?p=96290#p96290

Just search for PL2203 or FTDI. The search link is on the raspberrypi.org front page, or use Google adding raspberry to the above terms.
Posts: 6650
Joined: Wed Jan 25, 2012 6:39 pm
by M33P » Sun Sep 02, 2012 3:10 pm
Many posts do indeed exist but none as yet provide a solution. Many confuse other issues in with the core problem.

All of the reported instances of "hangs" previously have not been investigated by other users as they have simply power cycled the board because the entire USB/network fails leading to inability to input commands either via ssh or USB keyboards.

As I have the ability to interface with the GPIO serial console, I can still query the Pi and issue commands. This at least allows debugging of the problem, if I knew what to debug.
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm
by samsamsam » Sun Sep 02, 2012 3:44 pm
Try last kernel and use this kernel command line (/boot/cmdline.txt):
Code: Select all
dwc_otg.microframe_schedule=1 smsc95xx.turbo_mode=N dwc_otg.fiq_fix_enable=1 sdhci-bcm2708.sync_after_dma=0 dwc_otg.lpm_enable=0 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 rootwait dwc_otg.speed=1
Posts: 36
Joined: Fri Aug 17, 2012 11:36 am
by M33P » Sun Sep 02, 2012 10:15 pm
Kernel and firmware are latest, as described in the first post.

Addendum to first post: I can now use the FT232 for more than "a couple of keypresses". I have no idea what's changed (other than a simple reboot with the same settings) but now I get similar useability to the PL2303.

First test, reproduction and repetition.

Code: Select all
cmdline.txt = sdhci-bcm2708.enable_llm=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait


Managed to get repeatable crashes with
stty -F /dev/ttyUSB0 && stty -F /dev/ttyUSB0 && stty -F /dev/ttyUSB0 && stty -F /dev/ttyUSB0 && stty -F /dev/ttyUSB0 && stty -F /dev/ttyUSB0 &&

Crashes reliably after a few tries with both PL2303 and FT232.
But: GPIO console now completely fails to respond after the Pi hangs. No panic, no nothing. There is also no output after the usual bootup messages on tty1. The cursor stops blinking which implies that the system does actually hard lock.

Test 2: try it directly plugged into the pi and not the hub
FT232 - could get a crash after spamming the command much more than previously
PL2303 - could only get a crash after spamming the command for about 2 minutes

It would appear that adding the hub increases the likelihood of crashing.

Test 3: more USB load?
Adding extra USB load (read/write flash drive, network traffic) doesn't appear to increase or decrease the likelihood of crashing.

Test 4: add the watchdog, see if that reboots the pi
Hardware watchdog set to 10sec timeout, /sbin/watchdog will kick it every 5 seconds

Retest, yes now the watchdog reboots the pi on crash. Which means the realtime watchdog process got halted.

Test 5:
Code: Select all
cmdline.txt = dwc_otg.fiq_fix_enable=1 dwc_otg.lpm_enable=0 smsc95xx.turbo_mode=N sdhci-bcm2708.sync_after_dma=0 sdhci-bcm2708.enable_llm=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait


PL2303 and FT232 plugged into hub.

PL2303 doesn't crash even after spamming command
FT232: yes! crash but could only get a crash after spamming command much more than previously

Test 6:
Code: Select all
cmdline.txt = dwc_otg.speed=1 dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 dwc_otg.lpm_enable=0 smsc95xx.turbo_mode=N sdhci-bcm2708.sync_after_dma=0 sdhci-bcm2708.enable_llm=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait


No crashes with PL2303.
No crashes with FT232.

These options are a sledgehammer to the speed of the attached USB (reverts to full speed not highspeed), resulting in unacceptable performance:
Code: Select all
[    3.514599] usb 1-1.2: new full-speed USB device number 4 using dwc_otg
[    3.645638] usb 1-1.2: not running at top speed; connect to a high speed hub
[    3.682657] usb 1-1.2: New USB device found, idVendor=1a40, idProduct=0201
[    3.712871] usb 1-1.2: New USB device strings: Mfr=0, Product=1, SerialNumber=0
[    3.748228] usb 1-1.2: Product: USB 2.0 Hub [MTT]
[    3.775292] hub 1-1.2:1.0: USB hub found
[    3.793491] hub 1-1.2:1.0: 7 ports detected


iperf -s (on remote machine):
Code: Select all
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:   853 KByte (default)
------------------------------------------------------------
[  4] local 192.168.100.3 port 5001 connected with 192.168.100.47 port 39329
[  4]  0.0-10.3 sec  9.62 MBytes  7.85 Mbits/sec


Pendrive speeds:
Code: Select all
hdparm -tT /dev/sda:
/dev/sda:
 Timing cached reads:     2 MB in  2.03 seconds = 1009.01 kB/sec
 Timing buffered disk reads:   4 MB in  4.04 seconds = 1013.92 kB/sec
 

Test 7:
Code: Select all
 cmdline.txt = dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 dwc_otg.lpm_enable=0 smsc95xx.turbo_mode=N sdhci-bcm2708.sync_after_dma=0 sdhci-bcm2708.enable_llm=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

FT232: crashes after a few tries
PL2303: crashes after spamming the command a few times

Seems that dwc_otg.microframe_schedule increases the chance of crashing.
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm
by samsamsam » Sun Sep 02, 2012 11:42 pm
As you can see You can get RPi to works stable with USB<->RS232 converters :P With speed = 1. I think when you use full-speed hub (USB 1.1) for USB<->RS232 system will be stable without forcing full-speed for all USB devices.

If you can get USB hub 1.1 you can check this theory ;)
Posts: 36
Joined: Fri Aug 17, 2012 11:36 am
by M33P » Mon Sep 03, 2012 10:24 pm
The Pi isn't a time machine so I can't go back to 2003 and get a USB1.1 hub

As I now have a maximally-broken setup with cmdline.txt I need to revert to the "wait and see" which results in USB being unserviceable after x minutes. It is a much lesser form of breakage, though I suspect it is along the same lines as the complete hang.

I will add another device to the mix tomorrow: a Moxa Uport 1110.
I need to figure out how to crosscompile on my other machine to get a kernel full of more debug info and possibly a way to break the hardlock without rebooting.
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm
by M33P » Sun Sep 09, 2012 3:17 pm
Found the cause

The pi isn't hanging, it's just looping forever in a kernel function. Guess where: the dwc_otg driver.

Setup:
After a while futzing around, I enabled debugging in the dwc_otg driver by tying the Makefile -DDEBUG option to CONFIG_USB_DEBUG=y in the overall kernel config.

The synopsys driver doesn't use ANY standard method of debugging or support dynamic debugging, it has a single parameter (that has to be set at boot time for a compiled-in driver) that determines the messages logged. It then proceeds to wrap debug calls via a preprocessor macro in dwc_otg_dbg.h which takes this parameter and determines which lines are spat out, rather than use the fully-featured and configurable dynamic debug options in Linux (dynamic debug can turn specific pr_debug and dev_dbg printk's on and off on a line-by-line basis).

Code: Select all
cmdline.txt = profile=2 loglevel=7 dwc_otg.debug=14 dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 dwc_otg.lpm_enable=0 smsc95xx.turbo_mode=N sdhci-bcm2708.sync_after_dma=0 sdhci-bcm2708.enable_llm=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 root=/dev/mmcblk0p2 console=tty1 rootfstype=ext4 elevator=deadline rootwait

uname -a = Linux raspberrypi 3.2.27derpy+ #6 PREEMPT Sun Sep 9 12:54:38 BST 2012 armv6l GNU/Linux


TO the juicy part:
In order to capture every printk and debug message, I did

Code: Select all
#:echo 7 > /proc/sys/kernel/printk

Which then enabled debug messages to ttyAMA0 after the kernel had booted/init'd/whatever

Then I did stty -F /dev/ttyUSB0 (PL2303 attached to this) which generated plenty spam in the serial console, logged via putty

Code: Select all
=~=~=~=~=~=~=~=~=~=~=~= PuTTY log 2012.09.09 15:53:04 =~=~=~=~=~=~=~=~=~=~=~=
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Linux version 3.2.27derpy+ (jonathan@Bactrian) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #6 PREEMPT Sun Sep 9 12:54:38 BST 2012
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] Machine: BCM2708
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 48768
[    0.000000] Kernel command line: dma.dmachans=0x3c bcm2708_fb.fbwidth=1184 bcm2708_fb.fbheight=624 bcm2708.boardrev=0x2 bcm2708.serial=0x80165ce9 smsc95xx.macaddr=B8:27:EB:16:5C:E9 profile=2 loglevel=7 dwc_otg.debug=14 dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 dwc_otg.lpm_enable=0 smsc95xx.turbo_mode=N sdhci-bcm2708.sync_after_dma=0 sdhci-bcm2708.enable_llm=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 root=/dev/mmcblk0p2 console=tty1 rootfstype=ext4 elevator=deadline rootwait
[    0.000000] kernel profiling enabled (shift: 2)

** more bootup spam goes here **

[   22.872779] smsc95xx 1-1.1:1.0: eth0: link up, 100Mbps, full-duplex, lpa 0xCDE1
[   24.590650] DWC_otg: DWC OTG HCD QH Initialized

** stty command issued **

[   82.823678] DWC_otg: QH init: EP 3: TT found at hub addr 4, for port 6
[   82.834311] DWC_otg: DWC OTG HCD QH Initialized
[   82.845904] DWC_otg: QH init: EP 2: TT found at hub addr 4, for port 6
[   82.857535] DWC_otg: QH init: EP 1: TT found at hub addr 4, for port 6
[   82.869164] DWC_otg: DWC OTG HCD QH Initialized
[   82.878489] DWC_otg: DWC OTG HCD QH Initialized
[   83.202387] DWC_otg: DWC OTG HCD URB Dequeue
[   83.210688] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[   83.221399] DWC_otg: DWC OTG HCD URB Dequeue OK

** repeat 10 times **

[   93.463634] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[   93.472296] DWC_otg: DWC OTG HCD URB Dequeue OK
[   93.802473] DWC_otg: DWC OTG HCD URB Dequeue
[   93.809044] DWC_otg: DWC OTG HCD URB Dequeue failed - rc -43
[   93.817124] DWC_otg: DWC OTG HCD URB Dequeue
[   93.823649] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[   93.832305] DWC_otg: DWC OTG HCD URB Dequeue OK
[   94.183149] DWC_otg: DWC OTG HCD URB Dequeue
[   94.189823] DWC_otg: DWC OTG HCD URB Dequeue failed - rc -43
[   94.197955] DWC_otg: DWC OTG HCD URB Dequeue
[   94.204457] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[   94.213118] DWC_otg: DWC OTG HCD URB Dequeue OK

** repeat more **

[  106.641837] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[  106.650303] DWC_otg: DWC OTG HCD URB Dequeue OK
[  106.852882] DWC_otg: DWC OTG HCD URB Dequeue
[  106.859652] DWC_otg: DWC OTG HCD URB Dequeue failed - rc -43
[  106.867752] DWC_otg: DWC OTG HCD URB Dequeue
[  106.874730] usb 1-1.2.6: pl2303_read_int_callback - usb_submit_urb failed with result -1
[  106.885571] DWC_otg: DWC OTG HCD URB Dequeue failed - rc -43
[  107.063264] DWC_otg: DWC OTG HCD URB Dequeue
[  107.070032] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[  107.078886] DWC_otg: DWC OTG HCD URB Dequeue OK

** repeat more (no pl2303 errors) **

[  107.260134] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[  107.269028] DWC_otg: DWC OTG HCD URB Dequeue OK
[  107.276114] DWC_otg: DWC OTG HCD URB Dequeue

** repeat more **

[  114.054730] DWC_otg: DWC OTG HCD URB Dequeue - delete Queue handler
[  114.063395] DWC_otg: DWC OTG HCD URB Dequeue OK
[  114.094343] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.094364]
[  114.119592] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600021
[  114.119614]
[  114.146260] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.146281]
[  114.174274] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.174296]
[  114.203440] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.203461]
[  114.234062] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600021
[  114.234083]
[  114.266136] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.266158]
[  114.299538] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600009
[  114.299560]
[  114.334054] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.334075]
[  114.369887] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.369910]
[  114.407021] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600001
[  114.407043]
[  114.445533] ERROR::handle_hc_chhltd_intr_dma:2113: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000402, intsts 0x06600021
[  114.445554]

** pi is broken, infinite loop of messages to console **


Why does it get stuck in a function that according to the comments does this:
dwc_otg_hcd_intr.c:1986
/**
* Handles a host Channel Halted interrupt in DMA mode. This handler
* determines the reason the channel halted and proceeds accordingly.
*/

So now that we've identified the fault condition and the location of the fault, why is it occurring?

Edit: looking at the code, in all other cases the interrupt is handled by handing control off to another function which deals with specific interrupts. In this case, the function simply returns without handling - which is probably why it breaks the entire kernel.
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm
by obcd » Mon Sep 10, 2012 7:59 am
Thanks for posting such detailed information about your findings M33P
I am sure it will help others examining the same issue.
I assume if you have an interrupt, and you don't do anything to handle it, it will generate the interrupt again right after you jump out of the interrupt handler. Maybe it's a hardware problem and the status register isn't properly showing what caused the interrupt, so that the driver doesn't know how to handle it.
Seems like they never expected it to happen, so they didn't bother to write code in the driver for it.
If I may believe the foundation input on this, Gordon (gsh on this forums) is examining the same issues at the moment. It might be worth a try to send him a PM. He seems to have access to even the synoptic hardware verilog description files. He won't be able to share them, but he could perhaps tell you what the status is you see at the moment the system is hanging. I am unsure if this is described in the broadcom soc datasheet.
It also seems that connecting and using an usb2serial adapter to the Pi makes the number of interrupts dramatically increase again. I can understand an interrupt rate up to 8000 / sec (an interrupt every usb microframe), but I don't get it when people say they get up to 40000 / sec. This would also point into the direction that some interrupts aren't handeled upon their first occurence. Maybe if the routine is called again, the status register finally shows the correct status and the interrupt is handled.
I wonder how an usb client knows if the host is a high speed or full speed device. I think it tries to enumerate at high speed, and if that fails, it tries full speed. Maybe it's possible to force specific devices to fullspeed. (Like the usb2serial dongles) It's not a solution, but it could be a workaround untill a better solution is found.
I think this the first technical underlined article I find about the issue. We are making progress.
Posts: 908
Joined: Sun Jul 29, 2012 9:06 pm
by gsh » Mon Sep 10, 2012 8:12 am
So,

Basically as I mentioned elsewhere (probably in the github bug tracker which is the proper place for this...)

The serial dongles are ignoring IN requests for long periods and returning NAKs (every 16us or so) and each NAK requires an interrupt to the processor... It may be that your code is just locked up trying to service the serial dongle but is unable to do so.

Why do you think it is locked up permanently in that function? Do you have any evidence that it isn't leaving the function. Currently I believe that it takes of the order of 20us to process the NAKs so if one occurs every 16us then it really would look like it was completely locked up...

It would be useful if you could put in some prints (or even early_printk's so it goes directly to the serial hardware...) which output the number of different types of interrupt once per second... There was some code in there to do that (added by Naren with the SOF debug #define)

Yes I do have access to the dwg_otg datasheet and even the verilog (as well as a very expensive USB analyser) but not so much access to the time...

Gordon
--
Gordon Hollingworth PhD
Raspberry Pi - Director of Software Engineering
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 1143
Joined: Sat Sep 10, 2011 11:43 am
by obcd » Mon Sep 10, 2012 8:31 am
Thanks for the reply Gordon.
I understand this is a spare time occupation, but is there a way to combine the efforts of people examining the issues?
I mean, do you dev's have an irc channel you use to exchange thoughts?
I tried #raspberrypi and #raspberrypi-dev yesterday, but are unsure if it are the correct ones to monitor.
github is great, but it only shows changes after they are fully worked out.

Is there a logical explanation for the usb2serial devices generating NAK's so frequently?
I noticed that the newer ftdi chipsets work better, and the older ones don't. I know that's not much to work with.

Could it be that adding the debug prints slows down things to much to allow normal behavour? Would that mean that the issues would appear less frequent on an overclocked Pi? (voiding warranty in the interest of science..)
Posts: 908
Joined: Sun Jul 29, 2012 9:06 pm
by dennyfmn » Thu Sep 27, 2012 1:06 pm
Following suggestions above I added these parameters to the beginning of my cmdline.txt

dwc_otg.speed=1 dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 sdhci-bcm2708.sync_after_dma=0

I am running kernel dated Mon Sep 17, from the Raspberry Pi foundation that supports overclocking without voiding your warranty. My unit runs fine at 950 MHz.

Checked the nak_holdoff_enable parameter

/sys/module/dwc_otg/parameters/nak_holdoff_enable = Y

My looping test with a Smarthome Insteon PLM (Power Line Modem) with embedded Future Technology Devices International, Ltd FT232 USB-Serial (UART) hung after 4.72 hr. The test cycles an ApplicanceLinc on/off every 3 seconds. This is an improvement, but did not solve the problem.

The dwc_otg.speed=1 parameter really clobbered the network throughput. Testing with iperf it went from 93.5 Mbits/sec before the change to 7.84 Mbits/sec after adding dwc_otg.speed=1.

Thanks for all the work on trying to fix this problem.
Posts: 26
Joined: Thu Aug 16, 2012 1:36 pm
by M33P » Fri Sep 28, 2012 12:04 pm
Update:
I updated to the kernel that includes the Nak_holdoff fix - much improved behaviour, many thanks to investigation by others.

cmdline.txt
Code: Select all
profile=2 loglevel=7 dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 dwc_otg.lpm_enable=0  sdhci-bcm2708.sync_after_dma=0 sdhci-bcm2708.enable_llm=1 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 root=/dev/mmcblk0p2 console=tty1 rootfstype=ext4 elevator=deadline rootwait


Code: Select all
uname -a: Linux raspberrypi 3.2.27+ #174 PREEMPT Wed Sep 26 14:09:47 BST 2012 armv6l GNU/linux


I can now reliably spam stty -F /dev/ttyUSBx on both Prolific PL2303 and FT232R serial converters and (so far) has not crashed or hung the pi. I have still had the odd error message from the pl2303 driver where a URB submit resulted in an error.

Additionally, vmstat shows a drastically reduced interrupt count - 464 when using the serial port, ~320 otherwise.

So the only things wrong with my Pi are now:
- Sidewinder X4 keyboard is still unusable
- input polyfuse has decided it will drop 180mV across it in normal operation
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm
by dennyfmn » Sun Sep 30, 2012 10:24 pm
I got the latest firmware from GitHub and installed it. The system is running at 900MHz.

Code: Select all
root@raspberrypi:~# uname -a
Linux raspberrypi 3.2.27+ #174 PREEMPT Wed Sep 26 14:09:47 BST 2012 armv6l GNU/Linux

This kernel did not resolve my hang problem with the Smarthome Insteon PLM (Power Line Modem) with embedded Future Technology Devices International, Ltd FT232 USB-Serial (UART).

With standard cmdline.txt
Code: Select all
dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

The system hung in 22.4 min.

With extra params in cmdline.txt
Code: Select all
dwc_otg.microframe_schedule=1 dwc_otg.fiq_fix_enable=1 sdhci-bcm2708.sync_after_dma=0 dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait

The system hung in 4.5 min.

I have noticed that the ftdi driver is newer on the Raspberry Pi distributions than on standard i386 Debian squeeze. My looping test has run for over 36 hours without hanging on an i386 Debian squeeze system before I stopped it with ^C. I'm wondering if something was changed in the newer driver that broke the older FTDI implementation. I have not been able to find the source for 1.6.0 on the FTDI web site, the source for 1.5.0 is there.

Debian squeeze i386
ftdi_sio: v1.5.0:USB FTDI Serial Converters Driver

Raspberry Pi
ftdi_sio: v1.6.0:USB FTDI Serial Converters Driver
Posts: 26
Joined: Thu Aug 16, 2012 1:36 pm
by albundy » Mon Oct 01, 2012 6:17 pm
I have 2 Pi's. Both have the 1.6.0 FTDI driver. My primairy Pi is very unstable when the FTDI USB to serial adapter is connected. This Pi has been updated with apt-get update and apt-get upgrade but also with rpi-update. Initially this Pi was rock solid stable but after a few updates it became very unstable. eht0 disappeared every few hours.

The other Pi uses an older build but also the 1.6.0 FTDI driver and has not been updated for a while. This Pi is rock solid stable with the FTDI USB to serial adapter attached.

So I don't think it has something to do with the FTDI driver.
Posts: 31
Joined: Tue Jul 17, 2012 4:56 pm
by iant1945 » Fri Oct 05, 2012 9:31 am
I'm a newbie. I've written a Windows program on my PC to interrogate my security system board via its USB cable, which works fine. I've now ported it to Python on my RPI (revision 2 board) and I'm experiencing the USB lock-up. I've tried dwc_otg.speed=1 in cmdline.txt and that worked a few times. Then I tried re-booting and the USB locked up and I couldn't use my keyboard to log in. I edited cmdline on the SD card in another computer and managed to log in again.

The USB cable is plugged directly into the RPi's USB socket. Would it help to buy a powered hub ?

Is it a good idea to change the USB speed inside of my program, then reverting it after communicating with the alarm ? If so, how can I do it ? Or do I just have to wait until the USB driver gets fixed ?

Ian
Posts: 18
Joined: Fri Oct 05, 2012 9:14 am
by itimpi » Fri Oct 05, 2012 3:53 pm
You do not mention what your power adapter is capable of delivering, and also what amperage the USB device requires. It could just be that there is insufficient power getting to the output side in which case a powered hub is likely to help. If it is not a power problem and really is a problem at the USB driver level then I guess you will just have to wait for a fix.
Posts: 1059
Joined: Sun Sep 25, 2011 11:44 am
Location: Potters Bar, United Kingdom
by iant1945 » Fri Oct 05, 2012 5:12 pm
< You do not mention what your power adapter is capable of delivering, and also what amperage the USB device requires. >

The power supply is the standard 5w plug supplied by Farnell. The alarm company tell me that the USB connection is powered from their board and doesn't ask for anything from RPi. It's only a data link.

I booted with only the keyboard (50mA) connected, then removed it and plugged in the alarm connection, so there shouldn't be any load on the USB. I control the software remotely from my PC via ssh.
Posts: 18
Joined: Fri Oct 05, 2012 9:14 am
by vladimirc » Thu Oct 11, 2012 5:22 pm
Hi

I am not such an expert in this issues, but I was also struggling with a PL2203 prolific. The code was running, but it was ridiculously slow. I tried to upgrade but still it was not improving. The problem was solved simply reinstalling the updated version 2012-09-18 from the download site. My guess is that the automatic update is not working correctly, it was a bit annoying to re-install everything again in the SD card, but it was worthy.

Just in case that somebody still hasn't tried that

Regards
Posts: 1
Joined: Thu Oct 11, 2012 5:14 pm
by mauzer » Mon Oct 15, 2012 8:24 pm
Hi,

I had the same problem with a 2012-09-18-wheezy-raspbian, but never had with a 2012-08-16-wheezy-raspbian...
Posts: 1
Joined: Mon Oct 15, 2012 8:18 pm
by iant1945 » Thu Oct 18, 2012 6:04 pm
< The problem was solved simply reinstalling the updated version 2012-09-18 from the download site. >

Not for me. It still locks up with a clean install onto a new SD card.
Posts: 18
Joined: Fri Oct 05, 2012 9:14 am
by Darth » Sun Oct 21, 2012 9:45 am
Exactly the same problem with me, stock image (2012-9-18) with only one program compiled (WVIEW) and TighhtVNCServer installed. I've spent two days trying to tie the issue down and it looks like the Pi does not like my USB/Serial adapter (RS232), the system runs for about 6 hours then locks up. I'm running the unit headless and have the USB/Serial adapter plugged into a weather station head unit, the Pi is used as a data repository and HTTP's data to Wunderground. When it locks out both the Ethernet and USB fail, once rebooted it works fine for another six hours then locks again; frustrating. Now rolled back to 2012-8-16 Raspbian and that for me has been stable for the last month. Hope it gets fixed soon as the latest version of Raspbian is much faster then its predecessors.
Posts: 21
Joined: Sat Aug 04, 2012 10:18 am
Location: Wisbech
by winstonma » Fri Oct 26, 2012 3:05 am
I got the same issue.

When I run a python program I got the handle_hc_chhltd_intr_dma error. Kernel update and modification of cmdline.txt won't improve the stability.
Posts: 11
Joined: Tue Jun 05, 2012 10:01 am
by davesedar » Wed Oct 31, 2012 11:33 am
I will add another device to the mix tomorrow: a Moxa Uport 1110.


Did you ever get the Moxa Uport 1110 driver working?
Posts: 31
Joined: Mon Mar 05, 2012 3:41 pm
by M33P » Wed Oct 31, 2012 4:03 pm
davesedar wrote:
I will add another device to the mix tomorrow: a Moxa Uport 1110.


Did you ever get the Moxa Uport 1110 driver working?


Yes - but currently compiled against the 3.6.y kernel. As vendor drivers go, this one is the least scummy I have encountered - this one just didn't assume kernels >3.4 existed so required a bit of macro alteration in the driver.

I should note that no further issues have been experienced by me with either the most recent 3.2.27 kernels or 3.6.y (built by myself from github). For a common base to see if the bug "reappeared" with a version of the FTDI driver I shall retest with the latest raspbian - with all 3 converters.
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm