omx audio_decoder hangs on EmptyThisBuffer() in kernel

General programming chat and advice for beginners

11 posts
by stupid_boy » Tue Jan 01, 2013 8:40 pm
Hello

I am working on xbmc code. Idea is to provide audio output from both analog and HDMI at same time.
To do this, i add audio_splitter, which feeds 2 audio_render's. Actually, i can't find any information is this possible or not, so i decide to give it a try. Currently all of my code work without any OMX errors, all components are in OMX_EXECUTING state, all tunnels are established, all ports are opened. Problem is that decoder hangs on one of calls, usually rendering entire system to hang after some time. After some investigations I found following logs in /var/log/syslog, which i believe are consequences from some error in OMX:
Code: Select all
Jan  1 13:22:35 raspberrypi kernel: [  361.184073] INFO: task kworker/0:3:3007 blocked for more than 120 seconds.
Jan  1 13:22:35 raspberrypi kernel: [  361.184082] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  1 13:22:35 raspberrypi kernel: [  361.184091] kworker/0:3     D c038072c     0  3007      2 0x00000000
Jan  1 13:22:35 raspberrypi kernel: [  361.184120] [<c038072c>] (__schedule+0x2bc/0x568) from [<c0381c48>] (__mutex_lock_slowpath+0xb8/0x154)
Jan  1 13:22:35 raspberrypi kernel: [  361.184159] [<c0381c48>] (__mutex_lock_slowpath+0xb8/0x154) from [<c001a218>] (bcm_mailbox_write+0x34/0xb8)
Jan  1 13:22:35 raspberrypi kernel: [  361.184182] [<c001a218>] (bcm_mailbox_write+0x34/0xb8) from [<c001a390>] (bcm_mailbox_property+0x5c/0xe4)
Jan  1 13:22:35 raspberrypi kernel: [  361.184208] [<c001a390>] (bcm_mailbox_property+0x5c/0xe4) from [<c02a1b80>] (bcm2835_cpufreq_driver_target+0x84/0x114)
Jan  1 13:22:35 raspberrypi kernel: [  361.184229] [<c02a1b80>] (bcm2835_cpufreq_driver_target+0x84/0x114) from [<c029e1f8>] (__cpufreq_driver_target+0x28/0x38)
Jan  1 13:22:35 raspberrypi kernel: [  361.184248] [<c029e1f8>] (__cpufreq_driver_target+0x28/0x38) from [<c02a0708>] (do_dbs_timer+0x274/0x394)
Jan  1 13:22:35 raspberrypi kernel: [  361.184278] [<c02a0708>] (do_dbs_timer+0x274/0x394) from [<c003da30>] (process_one_work+0x134/0x38c)
Jan  1 13:22:35 raspberrypi kernel: [  361.184299] [<c003da30>] (process_one_work+0x134/0x38c) from [<c003e4a4>] (worker_thread+0x1a0/0x354)
Jan  1 13:22:35 raspberrypi kernel: [  361.184319] [<c003e4a4>] (worker_thread+0x1a0/0x354) from [<c0042e3c>] (kthread+0x84/0x8c)
Jan  1 13:22:35 raspberrypi kernel: [  361.184339] [<c0042e3c>] (kthread+0x84/0x8c) from [<c000e930>] (kernel_thread_exit+0x0/0x8)

and
Code: Select all
Jan  1 13:24:35 raspberrypi kernel: [  481.172642] INFO: task kworker/0:2:34 blocked for more than 120 seconds.
Jan  1 13:24:35 raspberrypi kernel: [  481.172665] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  1 13:24:35 raspberrypi kernel: [  481.172677] kworker/0:2     D c038072c     0    34      2 0x00000000
Jan  1 13:24:35 raspberrypi kernel: [  481.172724] [<c038072c>] (__schedule+0x2bc/0x568) from [<c03813d0>] (schedule_timeout+0x198/0x230)
Jan  1 13:24:35 raspberrypi kernel: [  481.172748] [<c03813d0>] (schedule_timeout+0x198/0x230) from [<c03820c4>] (__down+0x88/0xc0)
Jan  1 13:24:35 raspberrypi kernel: [  481.172770] [<c03820c4>] (__down+0x88/0xc0) from [<c004809c>] (down+0x60/0x64)
Jan  1 13:24:35 raspberrypi kernel: [  481.172808] [<c004809c>] (down+0x60/0x64) from [<c001a2fc>] (bcm_mailbox_read+0x60/0x98)
Jan  1 13:24:35 raspberrypi kernel: [  481.172830] [<c001a2fc>] (bcm_mailbox_read+0x60/0x98) from [<c001a3d4>] (bcm_mailbox_property+0xa0/0xe4)
Jan  1 13:24:35 raspberrypi kernel: [  481.172852] [<c001a3d4>] (bcm_mailbox_property+0xa0/0xe4) from [<c029e1a8>] (bcm2835_get_temp+0x3c/0x64)
Jan  1 13:24:35 raspberrypi kernel: [  481.172881] [<c029e1a8>] (bcm2835_get_temp+0x3c/0x64) from [<c029d3c8>] (thermal_zone_device_update+0x2c/0x240)
Jan  1 13:24:35 raspberrypi kernel: [  481.172912] [<c029d3c8>] (thermal_zone_device_update+0x2c/0x240) from [<c003da30>] (process_one_work+0x134/0x38c)
Jan  1 13:24:35 raspberrypi kernel: [  481.172941] [<c003da30>] (process_one_work+0x134/0x38c) from [<c003e4a4>] (worker_thread+0x1a0/0x354)
Jan  1 13:24:35 raspberrypi kernel: [  481.172964] [<c003e4a4>] (worker_thread+0x1a0/0x354) from [<c0042e3c>] (kthread+0x84/0x8c)
Jan  1 13:24:35 raspberrypi kernel: [  481.172990] [<c0042e3c>] (kthread+0x84/0x8c) from [<c000e930>] (kernel_thread_exit+0x0/0x8)

There are nothing related to my actual OMX call in logs, and i can get these errors only if i was able to kill xbmc before it be able to hang entire system. Some times system hangs later, probably due to other locks ( I didn't know for /var/log/syslog at this time ).
My OMX chain is as follow:
Code: Select all
decoder -> mixer -> splitter |-> render ( for local)
                             |-> render ( for HDMI )

In this situation vcgencmd version wait forever.

If i don't establish tunnel to one of renders, but leave everything else untouched ( ie, still enabling all ports ), decoder accepts EmptyThisBuffer without errors or hangs, until it fill all internal buffer, but still not decoding. I thing it is normal, as chain is not fully populated.

My question is: is this even possible?
It seems that both outputs use distinct hardware paths, so i thing it is possible.
If it is not, some component must return error and not cause entire system to die.

I can provide source code, reproducing this behavior at 100%. As I am windows developer and this is my first ever touch with Linux, i do not know if any other information can help.

pi@raspberrypi ~ $ uname -a
Linux raspberrypi 3.2.27+ #307 PREEMPT Mon Nov 26 23:22:29 GMT 2012 armv6l GNU/Linux

pi@raspberrypi ~ $ vcgencmd version
Nov 22 2012 18:12:01
Copyright (c) 2012 Broadcom
version 352766 (release)

Regards
Andrey
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm
by stupid_boy » Sat Jan 05, 2013 6:24 pm
Something from Broadcom?
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm
by stupid_boy » Tue Jan 08, 2013 1:29 pm
build from Mon Jan 7 21:51:11 GMT 2013 only changes locked places
Code: Select all
Jan  8 15:06:47 raspberrypi kernel: [  721.131668] INFO: task kworker/0:1:11 blocked for more than 120 seconds.
Jan  8 15:06:47 raspberrypi kernel: [  721.131684] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 15:06:47 raspberrypi kernel: [  721.131692] kworker/0:1     D c03981a0     0    11      2 0x00000000
Jan  8 15:06:47 raspberrypi kernel: [  721.131746] [<c03981a0>] (__schedule+0x2c4/0x5b0) from [<c0398b84>] (schedule_preempt_disabled+0x24/0x34)
Jan  8 15:06:47 raspberrypi kernel: [  721.131782] [<c0398b84>] (schedule_preempt_disabled+0x24/0x34) from [<c0397744>] (__mutex_lock_slowpath+0xac/0x13c)
Jan  8 15:06:47 raspberrypi kernel: [  721.131801] [<c0397744>] (__mutex_lock_slowpath+0xac/0x13c) from [<c001a700>] (bcm_mailbox_property+0x30/0x15c)
Jan  8 15:06:47 raspberrypi kernel: [  721.131826] [<c001a700>] (bcm_mailbox_property+0x30/0x15c) from [<c02b1110>] (bcm2835_get_temp+0x3c/0x64)
Jan  8 15:06:47 raspberrypi kernel: [  721.131843] [<c02b1110>] (bcm2835_get_temp+0x3c/0x64) from [<c02b00a8>] (thermal_zone_device_update+0x2c/0x240)
Jan  8 15:06:47 raspberrypi kernel: [  721.131860] [<c02b00a8>] (thermal_zone_device_update+0x2c/0x240) from [<c0035024>] (process_one_work+0x150/0x40c)
Jan  8 15:06:47 raspberrypi kernel: [  721.131876] [<c0035024>] (process_one_work+0x150/0x40c) from [<c0035604>] (worker_thread+0x150/0x498)
Jan  8 15:06:47 raspberrypi kernel: [  721.131914] [<c0035604>] (worker_thread+0x150/0x498) from [<c003a7c8>] (kthread+0x88/0x94)
Jan  8 15:06:47 raspberrypi kernel: [  721.131938] [<c003a7c8>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8)


and

Code: Select all
Jan  8 15:06:47 raspberrypi kernel: [  721.131957] INFO: task kworker/0:2:39 blocked for more than 120 seconds.
Jan  8 15:06:47 raspberrypi kernel: [  721.131962] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan  8 15:06:47 raspberrypi kernel: [  721.131968] kworker/0:2     D c03981a0     0    39      2 0x00000000
Jan  8 15:06:47 raspberrypi kernel: [  721.131999] [<c03981a0>] (__schedule+0x2c4/0x5b0) from [<c0396ee0>] (schedule_timeout+0x158/0x1e0)
Jan  8 15:06:47 raspberrypi kernel: [  721.132027] [<c0396ee0>] (schedule_timeout+0x158/0x1e0) from [<c0397bd0>] (__down+0x88/0xc0)
Jan  8 15:06:47 raspberrypi kernel: [  721.132051] [<c0397bd0>] (__down+0x88/0xc0) from [<c00400d4>] (down+0x60/0x64)
Jan  8 15:06:47 raspberrypi kernel: [  721.132080] [<c00400d4>] (down+0x60/0x64) from [<c001a698>] (bcm_mailbox_read+0x60/0x98)
Jan  8 15:06:47 raspberrypi kernel: [  721.132096] [<c001a698>] (bcm_mailbox_read+0x60/0x98) from [<c001a794>] (bcm_mailbox_property+0xc4/0x15c)
Jan  8 15:06:47 raspberrypi kernel: [  721.132119] [<c001a794>] (bcm_mailbox_property+0xc4/0x15c) from [<c02b4ae4>] (bcm2835_cpufreq_driver_target+0x84/0x114)
Jan  8 15:06:47 raspberrypi kernel: [  721.132162] [<c02b4ae4>] (bcm2835_cpufreq_driver_target+0x84/0x114) from [<c02b1170>] (__cpufreq_driver_target+0x38/0x54)
Jan  8 15:06:47 raspberrypi kernel: [  721.132182] [<c02b1170>] (__cpufreq_driver_target+0x38/0x54) from [<c02b39c0>] (do_dbs_timer+0x274/0x394)
Jan  8 15:06:47 raspberrypi kernel: [  721.132200] [<c02b39c0>] (do_dbs_timer+0x274/0x394) from [<c0035024>] (process_one_work+0x150/0x40c)
Jan  8 15:06:47 raspberrypi kernel: [  721.132214] [<c0035024>] (process_one_work+0x150/0x40c) from [<c0035604>] (worker_thread+0x150/0x498)
Jan  8 15:06:47 raspberrypi kernel: [  721.132230] [<c0035604>] (worker_thread+0x150/0x498) from [<c003a7c8>] (kthread+0x88/0x94)
Jan  8 15:06:47 raspberrypi kernel: [  721.132247] [<c003a7c8>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8)


and even worse, sudo reboot now hands on "[Info] Will now restart" !!!!!

pi@raspberrypi ~ $ uname -a
Linux raspberrypi 3.6.11+ #350 PREEMPT Mon Jan 7 21:51:11 GMT 2013 armv6l GNU/Linux

pi@raspberrypi ~ $ vcgencmd version
Jan 7 2013 22:02:10
Copyright (c) 2012 Broadcom
version 360961 (release)
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm
by M33P » Tue Jan 08, 2013 3:08 pm
What's happening both before and after patch is that cpufreq and the thermal_zone driver are both requesting videocore mailbox access at the same time. In the other thread this was easily replicated by simply running multiple concurrent cat commands on the sysfs node responsible for reading the temperature.

cpufreq is probably changing core frequencies as a result of usermode activities (i.e. playing an audio)
thermal_zone is periodically polling the system temperature to check for overheating. These two are kernel tasks which is why they show up in the kernel log.

It is probably also the case that the videocore is substantially more "busy" with mailbox access due to playing audio which makes a hang more likely than with the very simple example posted in the other thread.
Posts: 199
Joined: Sun Sep 02, 2012 1:14 pm
by stupid_boy » Tue Jan 08, 2013 3:33 pm
cpufreq is busy because i use 'ondemand' governor. it's work is to pull/change constantly. XBMC makes really big cpu load, so governor rises cpu clock for it. may be later he didn't succeed to lower it. i suspect that problem can be reproduced without this big cpu load.
thermal zone is clear.
I thing they do not lock at same time, just watchdog detect them at same time. They are just post symptoms of main problem, side (d)effects. Most likely they work great. I use them as eventual sign what can be going under the cover.

Also, XBMC uses very extensively videocore at all.

I thing actual problem is in renders, but it faces itself at call to decoder. With one render and without splitter, all works ok, but second render add another ( concurrent ) stream to videocore. without it, video and audio streams are somehow not so concurrent. Because both renders are managed by common clock, their streams are absolutely parallel each with other. May be there is a problem. Yet another syncronisation in videocore calls/transitions.
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm
by jamesh » Mon Jan 14, 2013 2:54 pm
I checked with one of Brcm's OMX experts on this, this is his reply..

That all sounds sane. Certainly
audio_decode -> audio_splitter -> audio_render(hdmi)
|> audio_render(analog)
Is one of the IL Tests IIRC…. Actually not quite, it’s audio_capture -> splitter -> audio_encode and audio_render.
Not quite sure why he wants the mixer in there.

If he can provide the code that he is using to set stuff up, I can have a look at it and see if there is anything obviously wrong.

I’m not that familiar with HDMI audio. If he’s really after dual audio output, he may also need to worry about different rendering latencies between the two paths. Also if they’re doing video decode, then only one audio_render ought to be tunnelled to the clock, otherwise they will both attempt to act as clock master.

Really daft idea, but he has set config OMX_IndexConfigBrcmAudioDestination on one of the audio_renders to point it at the HDMI output? Audioplus should cope without, but might be a little quirky.
Soon to be employed engineer - Hurrah! Volunteer at the Raspberry Pi Foundation, helper at PiAcademy September 2014.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 11959
Joined: Sat Jul 30, 2011 7:41 pm
by stupid_boy » Mon Jan 14, 2013 3:31 pm
Thanks for your cooperation.

Link to my code is at bottom of Issue 134, I will add it here too:
https://github.com/stupid-boy/xbmc/comm ... both_audio
I tihng audio_mixer in XBMC is used mainly because it is cappable of automaticaly 'beautify' PCM. In documentation we read:
If necessary, input data is converted to match the PCM configuration of the output port at the mixdown stage (see the supported PCM formats). The mixer will perform sampling rate conversions, bitrate scaling and channel downmixing.

it is not used as real mixer, as only one of its inputs are connected. Mixer is used only when no passthru is selected.

About the clock, i hope both renders are tuneled as slaves.

There are some problem, explained in this thread: https://github.com/xbianonpi/xbian/issues/172
Currently RPI XBMC lack any way to output to external sound card. By this limitation, only way to allow these people to use this litle device is to allow both audio outputs. Personaly i will never use both outputs, but this is not a reason to decline support for these people. I understand, there will be some time delay between 2 streams, but i thing it is acceptable. It is clear that HDMI will be used as video output.
Also, in other topics here i see statements that output to both devices at same time is (intented to be) supported.

I don't exclude posibility that my code is buggy. Without clear documentation, i have to guess a lot of things. Also, i am not sure that my code in git is cleaned from some strange attempts to get it working.

Regards
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm
by stupid_boy » Mon Jan 14, 2013 9:08 pm
Sorry, when i write "output to external sound card" in my previous post, actually i mean "output to ALSA". By allowing ALSA output, such users can made the bridge they ask for, eventually using external sound card if ALSA have similar problem with both internal devices.

A long wanted alsasink component...., not only for that purpose. I too dream for this component, to use optical output of my TurtleBeach card with RPI.

Unfortunately, Bellagio alsaink seems to be not (directly) compatible with Broadcom implementation, so it can't be used for this.

This is not directly related to problem in this topic, I just made my previous post more clear.
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm
by jamesh » Wed Jan 16, 2013 2:25 pm
I have some more information from the OMX expert.
The code looks reasonable based on an initial read through with no prior knowledge of how this fits into the XBMC framework.

The only quirk I can see based on looking at the VC component code is that audio_render will try opening the specified audio destination first, and then drop back to “local” if that fails. That could result in the second audio_render failing to transition (it should raise an OMX_ErrorInsufficientResources error) as “local” was already open. If this substitution happens, it does update OMX_IndexConfigBrcmAudioDestination, so it might be worth reading that back just as a confirmation.

Audio_splitter is a very simple implementation in that it will hold on to an input buffer until it has managed to copy all the data contained in that buffer to all active output ports. If we have had one of the audio_renders fail to transition, then I could see it stalling the pipe after one buffer had been decoded and passed to it.
The original comment stated:
If i don't establish tunnel to one of renders, but leave everything else untouched ( ie, still enabling all ports ), decoder accepts EmptyThisBuffer without errors or hangs, until it fill all internal buffer, but still not decoding. I thing it is normal, as chain is not fully populated.

That will fail for this very reason – the disconnected port on audio_splitter is enabled so we are expecting to be able to send the data, but because no tunnel is established there are no buffers to fill. Could you try the test with that port disabled, and confirm if the audio pipe works, and does the audio come out the correct device?

If I get an hour to play, I’ll create a test to mimic this setup and run it with our Videocore debugger to see what is going on.
Soon to be employed engineer - Hurrah! Volunteer at the Raspberry Pi Foundation, helper at PiAcademy September 2014.
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 11959
Joined: Sat Jul 30, 2011 7:41 pm
by stupid_boy » Wed Jan 16, 2013 3:09 pm
again, great thanks

I do not know how Broadcom implements OMX, currently i am looking at Bellagio code. In it, if some component fail, entire chain will fail, but not hang, and more important, not hang kernel ( or videocore in particular ). That is main problem in this thread, to prevent entire system hangs. If simultaneous output is not possible, i thing it is ok, but system hang is not acceptable. Some error is welcome in such condition.

About errors: When i enable OMX event tracing in XBMC, if some component fail assyncronously, i expect to get async error event. there is no such thing in this concrete situation. actualy with both outputs enabled there are no any errors, even sync or async. As i remember, at some stages i get timeouts for event wait, but not error events. I am not sure at what stage i get them ( if current code in git will reproduce them ). My feelings are that i get them at some early stages of code.

I was thinking to dissable port of spliter for test purposes, but i didn't do that. I will try it when i have some time and will report back here, but don't expect to be done in short. Actualy this will test different aspects and because of this i stoped without this test. Full buffers of decoder are reasonable good sign for me at this stage ( as i say, it was expected and reasonable result ). When i eat all buffers, i get reasonable error and nothing hangs. Please understand my point, i am building in 256M RPI and build for every test case takes a looong time, so i prefer to skip some of them.

For me it is some deadlock, or missed release of mutex or whatever in some error handler, or similar condition. Most likely the later is true, otherwise i will see these deadlocked threads in gdb at later stage. Bad news is that i thing orphaned mutex is some at global system level, as it remain locked and continue to block all videocore calls even after i kill XBMC.

Regards
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm
by stupid_boy » Fri Feb 08, 2013 6:06 pm
fixed at commit from Feb 07.
issue closed.

thanks a lot!
Posts: 64
Joined: Tue Jan 01, 2013 7:53 pm