MrEngman
Posts: 3961
Joined: Fri Feb 03, 2012 2:17 pm
Location: Southampton, UK

ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 4:36 pm

After updating with rpi-update to #496, the latest revision, I found my Pi was getting these types of errors.

Code: Select all

[  157.365717] ERROR::handle_hc_chhltd_intr_dma:2537: handle_hc_chhltd_intr_dma: Channel 7, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x04400001
[  157.365717]
[  173.713839] ERROR::handle_hc_chhltd_intr_dma:2537: handle_hc_chhltd_intr_dma: Channel 6, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x04400001
[  173.713839]
[  188.311524] ERROR::handle_hc_chhltd_intr_dma:2537: handle_hc_chhltd_intr_dma: Channel 4, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x04400001
[  188.311524]
[  203.287140] ERROR::handle_hc_chhltd_intr_dma:2537: handle_hc_chhltd_intr_dma: Channel 3, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x04400001
[  203.287140]
[  216.992828] ERROR::handle_hc_chhltd_intr_dma:2537: handle_hc_chhltd_intr_dma: Channel 1, DMA Mode -- ChHltd set, but reason for halting is unknown, hcint 0x00000002, intsts 0x04400001
[  216.992828]
I've gone back through the various revisions and found this first appears to happen with

Code: Select all

Linux raspberrypi 3.6.11+ #488 PREEMPT Tue Jul 2 16:37:47 BST 2013 armv6l GNU/Linux
I didn't notice it until updating to #496 when my Pi started to behave oddly and I checked dmesg.

It appears that others have come across this as well http://www.raspberrypi.org/phpBB3/viewt ... 23&start=0

Can anyone explain what this is and why it may be happening?


MrEngman
Simplicity is a prerequisite for reliability. Edsger W. Dijkstra

Please post ALL technical questions on the forum. Please Do Not send private messages.

User avatar
joan
Posts: 14762
Joined: Thu Jul 05, 2012 5:09 pm
Location: UK

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 4:52 pm

It appears to have stopped some wireless dongles working as well.

http://www.raspberrypi.org/phpBB3/viewt ... 52#p383452

User avatar
jackokring
Posts: 816
Joined: Tue Jul 31, 2012 8:27 am
Location: London, UK
Contact: ICQ

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 5:22 pm

Just did the update, will let you know how it goes. #496
Pi[NFA]=B256R0USB CL4SD8GB Raspbian Stock.
Pi[Work]=A+256 CL4SD8GB Raspbian Stock.
My favourite constant 1.65056745028

MrEngman
Posts: 3961
Joined: Fri Feb 03, 2012 2:17 pm
Location: Southampton, UK

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 5:55 pm

joan wrote:It appears to have stopped some wireless dongles working as well.

http://www.raspberrypi.org/phpBB3/viewt ... 52#p383452
Certainly creates problems with the wifi if nothing else. I've just run through the resent updates and it appears to have started with #488 - .firmware_revision 44dc4f093c9ec075e6704965665311a94f6371aa

I've found the errors are easy to generate by just plugging in a wifi adaptor. Doing this I've seen several of these errors appear in quick succession.


MrEngman
Simplicity is a prerequisite for reliability. Edsger W. Dijkstra

Please post ALL technical questions on the forum. Please Do Not send private messages.

jdb
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 2239
Joined: Thu Jul 11, 2013 2:37 pm

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 6:12 pm

Funny, that's just what I spent the last week working on.

Patch is imminent.

For those that want to know the gory details, the FIQ will now respond to Host Channel interrupts as well as SOFs. This is to steal split transaction processing away from the large and cumbersome IRQ handlers in the main driver (and trust me, it's a hell of a lot faster). Any interrupts not handled by the FIQ are passed through to the main IRQ driver.

Unfortunately due to the way the hardware works, the FIQ is now faster than the time between interrupts and so one of two things happens:

1) the FIQ triggers the IRQ with a "you handle this one, it's not one of mine" but only tells the IRQ half the story, because the other half hasn't happened yet - if we get a subsequent interrupt telling the final half of the story then there's a subtle bug with the communication between the two that results in previous information being lost.
2) the FIQ will thrash because the hardware generates multiple (thousands) of interrupts per millisecond, NAK in particular for USB2.0 endpoints, that are unmasked if you previously had some sort of transaction error going to the device. The FIQ will respond in a time measured in fractions of a microsecond, and guess what - these interrupts are also generated in fractions of a microsecond. This, while not generating any visible error, will cause the IRQ that has to deal with the resulting storm plenty of delay which will impact the timeliness of servicing other more critical interrupts.
Rockets are loud.
https://astro-pi.org

User avatar
jackokring
Posts: 816
Joined: Tue Jul 31, 2012 8:27 am
Location: London, UK
Contact: ICQ

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 6:44 pm

Code: Select all

[ 1962.341959] usb 1-1.3.2: USB disconnect, device number 6
[ 1963.138759] ------------[ cut here ]------------
[ 1963.138836] WARNING: at fs/proc/generic.c:849 remove_proc_entry+0x280/0x2a0()
[ 1963.138877] remove_proc_entry: removing non-empty directory 'rtl819xC/wlan0', leaking at least 'ampdu_enable'
[ 1963.138890] Modules linked in: rfcomm bnep binfmt_misc snd_bcm2835 snd_pcm snd_page_alloc snd_seq snd_seq_device snd_timer snd btusb bluetooth evdev rfkill joydev 8192cu leds_gpio led_class
[ 1963.139017] [<c0013a7c>] (unwind_backtrace+0x0/0xf0) from [<c001e2ec>] (warn_slowpath_common+0x4c/0x64)
[ 1963.139050] [<c001e2ec>] (warn_slowpath_common+0x4c/0x64) from [<c001e398>] (warn_slowpath_fmt+0x30/0x40)
[ 1963.139099] [<c001e398>] (warn_slowpath_fmt+0x30/0x40) from [<c0118264>] (remove_proc_entry+0x280/0x2a0)
[ 1963.139564] [<c0118264>] (remove_proc_entry+0x280/0x2a0) from [<bf044748>] (rtw_proc_remove_one+0xf4/0x188 [8192cu])
[ 1963.140192] [<bf044748>] (rtw_proc_remove_one+0xf4/0x188 [8192cu]) from [<bf0456bc>] (rtw_dev_remove+0x94/0x134 [8192cu])
[ 1963.140560] [<bf0456bc>] (rtw_dev_remove+0x94/0x134 [8192cu]) from [<c0278054>] (usb_unbind_interface+0x48/0x108)
[ 1963.140638] [<c0278054>] (usb_unbind_interface+0x48/0x108) from [<c0237670>] (__device_release_driver+0x58/0xb0)
[ 1963.140668] [<c0237670>] (__device_release_driver+0x58/0xb0) from [<c02376e4>] (device_release_driver+0x1c/0x28)
[ 1963.140710] [<c02376e4>] (device_release_driver+0x1c/0x28) from [<c02371ec>] (bus_remove_device+0xc4/0xe4)
[ 1963.140747] [<c02371ec>] (bus_remove_device+0xc4/0xe4) from [<c023538c>] (device_del+0xf8/0x17c)
[ 1963.140773] [<c023538c>] (device_del+0xf8/0x17c) from [<c0276730>] (usb_disable_device+0xa0/0x1c8)
[ 1963.140813] [<c0276730>] (usb_disable_device+0xa0/0x1c8) from [<c026f9d0>] (usb_disconnect+0x7c/0x124)
[ 1963.140865] [<c026f9d0>] (usb_disconnect+0x7c/0x124) from [<c027088c>] (hub_thread+0x31c/0x10d4)
[ 1963.140905] [<c027088c>] (hub_thread+0x31c/0x10d4) from [<c003a7ec>] (kthread+0x88/0x94)
[ 1963.140945] [<c003a7ec>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8)
[ 1963.140975] ---[ end trace 33ac63bf6371b35b ]---
[ 1963.378860] usb 1-1.3.2: new high-speed USB device number 10 using dwc_otg
[ 1963.481028] usb 1-1.3.2: New USB device found, idVendor=7392, idProduct=7811
[ 1963.481058] usb 1-1.3.2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[ 1963.481076] usb 1-1.3.2: Product: 802.11n WLAN Adapter
[ 1963.481089] usb 1-1.3.2: Manufacturer: Realtek
[ 1963.481120] usb 1-1.3.2: SerialNumber: 00e04c000001
Pi[NFA]=B256R0USB CL4SD8GB Raspbian Stock.
Pi[Work]=A+256 CL4SD8GB Raspbian Stock.
My favourite constant 1.65056745028

MrEngman
Posts: 3961
Joined: Fri Feb 03, 2012 2:17 pm
Location: Southampton, UK

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 7:20 pm

jdb wrote:Funny, that's just what I spent the last week working on.

Patch is imminent.

For those that want to know the gory details, the FIQ will now respond to Host Channel interrupts as well as SOFs.
I'm glad you find it funny... No just kidding :D

Can't wait to try it. Just one question. What exactly does FIQ and SOF stand for?

If you need a guinea pig...


MrEngman
Simplicity is a prerequisite for reliability. Edsger W. Dijkstra

Please post ALL technical questions on the forum. Please Do Not send private messages.

jdb
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 2239
Joined: Thu Jul 11, 2013 2:37 pm

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sat Jul 13, 2013 8:35 pm

Glossary menagerie time:

Interrupts on pretty much all processors (caused by one of several things, but in this case a piece of hardware going "please sir, can I have some more?") are referred to as Interrupt Requests.

To fix some of the fundamental limitations of the USB hardware and driver, we use a peculiarity of the ARM architecture called the Fast interrupt request. On the ARM CPU, there are two defined layers or priorities of interrupt - normal IRQ and the FIQ. ARM also does some rather more peculiar (but extremely handy) things when in a FIQ, outside the scope of discussion. Go dig through the code for what happens when we use the FIQ as an example. You will note the use of magic assembler instructions because of the way that a FIQ is entered by the processor - a normal C compiler will not produce a FIQ-handler-compatible code.

The FIQ mechanism was explicitly designed to provide realtime performance for critical applications - think of an ARM CPU used as part of an audio processing or other realtime application - you synchronize your audio algorithm to some clock that results from a FIQ interrupt source.

FIQs are fast but dangerous, especially as far as Linux is concerned. You can pretty much do anything in a FIQ without any regard for concurrency or resource exclusivity. You can also only enable one source of interrupts to be a FIQ.

In our case, using the FIQ as the direct handler for all USB interrupts, the USB Start-of-Frame interrupt was the one triggered most often. 8000 times per second, in fact. USB frames are boundaries for determining "sets" of transactions or the scheduling of segments of lower-speed transactions. The sole purpose of this interrupt in the normal driver was to kick the transaction scheduler - "is there anything that needs doing in this frame?" was the question that was asked. The vast majority of the time, the answer to this question was "no, go back to sleep" but the normal driver took upwards of 20 microseconds of a 125 microsecond interval to answer that. The FIQ uses a precomputed state to (very, very quickly) decide if there is anything to be done for this frame.

One of the more recent USB patches puts split-transaction (talking to simple USB1.1 devices such as keyboards or mice) processing into the FIQ - another time-critical application because of the funky behaviour of most USB hubs. Hubs are very simple creatures that will panic at the first sign of delay or unexpected behaviour from the host - usually resulting in resetting their internal state and pretending to know nothing about what happened in the last 3 frames. This makes keypresses on your keyboard miss or repeatttttttttttttttttttttt. By using the FIQ to very precisely schedule when a split-transaction takes place, we can avoid this problem.

Unfortunately, the more complicated we make the FIQ, the more likely it is that we end up with weird and wonderful things happening. Such as that error message.
Rockets are loud.
https://astro-pi.org

MrEngman
Posts: 3961
Joined: Fri Feb 03, 2012 2:17 pm
Location: Southampton, UK

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sun Jul 14, 2013 8:59 am

Hi jdb,

Thanks for that. Not sure I completely understood it all as I'm more used to coding for Zilog Z80 in assembler although I did some work on one of the Linux SCSI drivers back in the mid 1990's but haven't touched Linux since then until getting my Pi.

Anyway don't waste time trying to explain any more. Fixing the bug is rather more important.

Thanks


MrEngman
Simplicity is a prerequisite for reliability. Edsger W. Dijkstra

Please post ALL technical questions on the forum. Please Do Not send private messages.

User avatar
jackokring
Posts: 816
Joined: Tue Jul 31, 2012 8:27 am
Location: London, UK
Contact: ICQ

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Sun Jul 14, 2013 1:27 pm

All been stable since, but then I was doing a large

Code: Select all

sudo apt-get install lazarus fp-docs
to test the WiFi when I errored, the WiFi restarted as a different device ID, so this is a form of memory leak. I have a reup script to ifdown and ifup my wlan, as I needed it before the "power save don't work fix."

I think the FIQ mode has a whole register set almost for processing, and considering how top and bottom end drivers for block devices used to be written, I assume it's not a simple job to get this working without some testing.
Pi[NFA]=B256R0USB CL4SD8GB Raspbian Stock.
Pi[Work]=A+256 CL4SD8GB Raspbian Stock.
My favourite constant 1.65056745028

User avatar
jackokring
Posts: 816
Joined: Tue Jul 31, 2012 8:27 am
Location: London, UK
Contact: ICQ

Re: ERROR::handle_hc_chhltd_intr_dma:2537:

Mon Jul 15, 2013 12:01 am

So I've unplugged and replugged the WiFi, and sudo reboot. Now the errors seem to have stopped. Maybe it's a power cycling of the WiFi dongle after the rpi-update that fixed it. I'll let you know.
Pi[NFA]=B256R0USB CL4SD8GB Raspbian Stock.
Pi[Work]=A+256 CL4SD8GB Raspbian Stock.
My favourite constant 1.65056745028

Return to “Troubleshooting”