MarkDarcy
Posts: 25
Joined: Thu Sep 20, 2018 8:23 am

Inconsistencies/bugs in Pi OMX IL implementation cause Deadlock (test harness attached)

Tue Mar 03, 2020 11:34 am

Hi,

I must apologise in advance for the long post. This post is aimed at those familiar with multithread programming in OMX IL, be it fellow OMX programmers, or software developers at "Pi Towers" who are intimate with the Pi's OMX IL layer. This is not a question about IL Client as IL Client is not being used here. If you are not in one of the aforementioned categories then this post may not be relevant to you and I apologise for having wasted your time thus far.

In the first instance, this is a "Request For Confirmation". I have hacked and conjured up workarounds to the issue described below and have a "working" solution. However, these workarounds are not only bad style, they are not portable amongst other OpenMAX implementations, and they also hamper the performance and significantly impact the reliability of my application. So, it would be nice to get a confirmation as to what I'm experiencing.

This may just be as simple as me not interpreting the OpenMax specification correctly, it may be something more. A fully-functioning (and very short) test harness is available to demonstrate the problem (see next post). Please feel free to use the harness in your own environment to either confirm or deny the problem I'm seeing. If you do try it, I would be grateful if you could share your findings here. It may be used as the basis for a bug report if a bug is confirmed (BSD-style license applies).

Background

Section 3.2.2.2 (page 82) of the OpenMax specification (1st September, 2008, Version 1.1.2.0) documents the OMX_SendCommand macro. It states that:
The component normally executes the command outside the context of the call, though a solution without threading may elect to execute it in context. In either case, the component uses an event callback to notify the IL client of the results of the command once completed.
Let's consider an example call to OMX_CommandStateSet(OMX_StateIdle) to move a component to the IDLE state. In a single threaded execution environment, my interpretation of the above paragraph in section 3.2.2.2 yields the following call sequence:

Code: Select all

Thread ID |       Action
----------+------------------------------
    A     | call to OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL)
    A     | OMX_CALLBACKTYPE.EventHandler() notification callback entered
    A     | OMX_CALLBACKTYPE.EventHandler() notification callback exits
    A     | call to OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL) exits
In a multi-threaded execution environment, my interpretation of the above paragraph in section 3.2.2.2 yields the following call sequence:

Code: Select all

Thread ID |       Action
----------+------------------------------
    A     | call to OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL)
    A     | call to OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL) exits with OMX_ErrorNone
    B     | OMX_CALLBACKTYPE.EventHandler() notification callback entered
    B     | OMX_CALLBACKTYPE.EventHandler() notification callback exits
However, the paragraph above clearly states that "The component normally executes the command outside the context of the call". In a multi-threaded environment this means that, with respect to the execution of threads A and B, any arbitrary interleaving is possible. For example,

Code: Select all

Thread ID |       Action
----------+------------------------------
    A     | call to OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL)
    B     | OMX_CALLBACKTYPE.EventHandler() notification callback entered
    A     | call to OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL) exits with OMX_ErrorNone
    B     | OMX_CALLBACKTYPE.EventHandler() notification callback exits
is also possible. This being the case, a recursive mutex and a condition variable can be used to implement the classic paradigm for serialising asynchronous execution. Taking our two threads A and B and some C11-based pseudo-code, thread A would behave as follows:

Code: Select all

Thread | Step |       Action
-------+------+------------------------------
   A   |  1   | OMX_GetHandle(&homxComp, "???", pDataPtr, &s_omxCallbacks);
   A   |  2   | mtx_lock(&pDataPtr -> mtxLock);
   A   |  3   | pDataPtr -> nStatus = -1;
   A   |  4   | nOmxError = OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL);
   A   |  5   | while (pDataPtr -> nStatus == -1) do { cnd_wait(&pDataPtr -> cndSignal, &pDataPtr -> mtxLock); }
   A   |  6   | mtx_unlock(&pDataPtr -> mtxLock);
while thread B would behave as follows:

Code: Select all

Thread | Step |       Action
-------+------+------------------------------
   B   |  1   | OMX_CALLBACKTYPE.EventHandler(homxComp, pAppData, eEvent, nData1, nData2, pEventData) called
   B   |  2   | pDataPtr = (PMYTYPE) pAppData;
   B   |  3   | mtx_lock(&pDataPtr -> mtxLock);
   B   |  4   | pDataPtr -> nStatus = nData1;
   B   |  5   | cnd_signal(&pDataPtr -> cndSignal);
   B   |  6   | mtx_unlock(&pDataPtr -> mtxLock);
   B   |  7   | OMX_CALLBACKTYPE.EventHandler(homxComp, pAppData, eEvent, nData1, nData2, pEventData) exits
Given a recursive mutex, the above paradigm for serialisation works in both single-threaded and multi-threaded execution environments. For single-threaded execution (i.e., A == B), acquisition of the mutex at thread B step 3 (B3) is guaranteed to succeed as thread B already owns the mutex from step A2 (remember, A == B). For multi-threaded execution, acquisition of the mutex at B3 should also be guaranteed to succeed as, according to section 3.2.2.2 of OpenMax:
The OMX_SendCommand macro will invoke a command on the component. This is a non-blocking call that should, at a minimum, validate command parameters but return within five milliseconds.
So, the mutex is released by A at A5 in around 5ms after calling OMX_SendCommand() and is only re-acquired by A when B, having signalled the condition variable at B5, completes B6.

The discussion so far should not hold any surprises.

The Problem

Even though the OMX implementation on the Raspberry Pi uses multiple threads, the above paradigm doesn't work. In fact, using the above paradigm causes deadlock. The reason appears to be a simple one. The calls to OMX_SendCommand() do indeed execute in a separate thread, B. However, execution of the calling thread A is blocked at A4 until OMX_SendCommand() returns, which it never does because acqusition of the mutex at B3 deadlocks due to A not having had a chance to release the lock (this would normally occur at A5). In effect, the following is happening:

Code: Select all

Thread | Step |       Action
-------+------+------------------------------
   A   |  2   | mtx_lock(&pDataPtr -> mtxLock);
   A   |  3   | pDataPtr -> nStatus = -1;
   A   |  4   | nOmxError = OMX_SendCommand(homxComp, OMX_CommandStateSet, OMX_StateIdle, NULL);
   B   |  1   | OMX_CALLBACKTYPE.EventHandler(homxComp, pAppData, nData1, nData2, pEventData) called
   B   |  2   | pDataPtr = (PMYTYPE) pAppData;
   B   |  3   | mtx_lock(&pDataPtr -> mtxLock);
...HANG...
This appears to be a fundamental multi-thread programming error. Why are calls to OMX_SendCommand() blocking the caller while the OMX implementation goes off and does its thing on a different thread? In effect, the Pi OMX IL implementation appears to be doing serialisation of multiple threads on behalf of the caller when Section 3.2.2.2 of the OpenMAX specification forbids it.

This problem is 100% reproducible in a program that calls absolutely nothing else but OMX IL and pthread calls. There is no arbitrary interleaving of threads A and B either; the execution sequence A2 | A3 | A4 | B1 | B2 | B3 above occurs 100% of the time.

The Scope of the Problem

The problem is exclusive to the OMX_SendCommand() macro. Through extensive testing, it appears that this problem has two modes of occurrence:

1) Successful command completion

If a command completes successfully then the problem mainly arises with OMX components in the video domain (that is, where OMX_PARAM_PORTDEFINITIONTYPE.eDomain is OMX_PortDomainVideo). Most of the time only state transitions to OMX_StateIdle are affected. However, some components also deadlock on transition to OMX_StateExecuting depending on whether the component is used as part of a tunnel or not. The problem also occurs for OMX_CommandPortDisable when there are no buffers allocated on the port (if buffers are not allocated on the port, notification arrives later when OMX_FreeBuffer() is called).

Affected components when commands complete successfully are as follows:

Code: Select all

Component                         | Affected?
----------------------------------+------------
OMX.broadcom.audio_capture        |     -
OMX.broadcom.audio_decode         |     Y
OMX.broadcom.audio_encode         |     -
OMX.broadcom.audio_lowpower       |     Y
OMX.broadcom.audio_mixer          |     -
OMX.broadcom.audio_processor      |     -
OMX.broadcom.audio_render         |     Y
OMX.broadcom.audio_splitter       |     -
----------------------------------+------------
OMX.broadcom.image_decode         |     -
OMX.broadcom.image_encode         |     -
OMX.broadcom.image_image_fx       |     -
OMX.broadcom.resize               |     -
OMX.broadcom.source               |     -
OMX.broadcom.transition           |     -
----------------------------------+------------
OMX.broadcom.camera               |     ?
OMX.broadcom.egl_render           |     Y
OMX.broadcom.rawcam               |     ?
OMX.broadcom.video_decode         |     Y
OMX.broadcom.video_encode         |     Y
OMX.broadcom.video_render         |     Y
OMX.broadcom.video_scheduler      |     Y
OMX.broadcom.video_splitter       |     Y
NOTE: Entries marked with a '?' could not be confirmed due to a Pi camera not being available.

2) Failed command completion

When any call to OMX_SendCommand() fails, it appears that the notification callback is always called synchronously from a separate thread so causing deadlock in the notification handler. All components are affected.

Summary

If anybody can verify this behaviour I would be grateful. Particulary, the inconsistent behaviour of the same set of interfaces across different component domains (e.g., image vs. video). If anybody at "Pi Towers" can simply look at the code and offer up an explanation as to either Raspberry's interpretion of the OpenMAX specification or the incorrect behaviour above that would be very welcome.

The test harness has been run against both stretch and buster and the problem confirmed on both platforms.

Thanks for sticking with me until the end.

Thanking everyone in advance for any assistance anybody may be able to offer.
Last edited by MarkDarcy on Wed Mar 04, 2020 9:12 am, edited 9 times in total.

MarkDarcy
Posts: 25
Joined: Thu Sep 20, 2018 8:23 am

Re: Inconsistencies/bugs in Pi OMX IL implementation cause Deadlock

Tue Mar 03, 2020 5:51 pm

As mentioned above, there is a test program to verify the inconsistencies I am seeing. I have attached it to this post. It is a single source file. The usage instructions are in the comment at the head of the file. The following single line will compile:

Code: Select all

gcc -g -Og -DOMX -DOMX_SKIP64BIT -I/opt/vc/include -L/opt/vc/lib -lpthread -lbcm_host -lopenmaxil omx-test.c -o omx-test
NOTE: add -DFORCE_FAILURE to demonstrate deadlock on failed commands.

Note

I have just updated the Problem Scope in the original post with new information. As a result, the test program has two modes of operation to demonstrate the two situations where the problem occurs. Apologies for the inconvenience, but please re-read the Problem Scope again if you have already read it before.

Thanks in advance,
Attachments
omx-test.zip
(3.81 KiB) Downloaded 38 times

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 9861
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: Inconsistencies/bugs in Pi OMX IL implementation cause Deadlock (test harness attached)

Wed Mar 04, 2020 11:50 am

Sorry, I have no interest in investigating issues in IL. It's dead as an API, and I would dearly love to remove it.

I suspect that your issue is that there is a single RPC channel to/from the VPU, and callbacks will be made from that context. If you block that callback context then you will get stalls. That's why ilclient_event_handler adds things to the event list, and then triggers a callback from which it is intended you signal you main thread to handle it. Do not do significant work in the callback context.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

MarkDarcy
Posts: 25
Joined: Thu Sep 20, 2018 8:23 am

Re: Inconsistencies/bugs in Pi OMX IL implementation cause Deadlock (test harness attached)

Thu Mar 05, 2020 3:45 pm

Hi 6by9,

Thanks for your reply.

There are many things I too would dearly like to do. Some, I dare say, might even be considered "interesting". However, being condescending to one's customers about what one may or may not be interested in doing certainly is not one of them, particularly when said customers have sacrificed weeks of their time and effort investigating potential bugs in your product, on your behalf, to potentially save You time.

Regardless of whether you consider the API dead or not, the API does ship with the unit. Furthermore, I surely am not going to be the last person who, on comparing the healthy range of OMX sample programs and comprehensive OMX documentation to the scant range of MMAL samples and meagre MMAL documentation, concludes that a new project would be easier to develop with OMX rather than MMAL. That said, if there is no longer any wish to support the OMX API, a short reply pointing people to the statement explaining that it is no longer recommended for new developments would suffice. For example, those chaps across the pond who create the Jetson don't care much for OpenMAX either, but at least they can manage to convey this to their customers without being condescending. Disappointing, to say the least.

Anyway, now that's out of the way, on to the technical substance of your reply which is, as always, much appreciated.

1) Sorry, I have no interest in investigating issues in IL

I did not ask for a full-blown investigation; I asked in the first instance for a confirmation and, appreciating you may be pressed for time, even provided you with a test harness to do it. Were you able to run the test harness? What platforms were you able to run it on and how did it behave? Would you consider calls to the OMX API in the harness to be properly sequenced? Would you mind sharing your findings with me and the rest of the community? At minimum, it would be nice to confirm that my environment is not faulty, firmware out of date, etc.

2) Do not do significant work in the callback context

As you will have seen from the test harness source, no work is being done at all in callbacks, let alone "significant" work (is going for a mutex in a multithread environment considered "significant work"?). Furthermore, over half of the components do work as I would expect with respect to multithreading given the paradigm used in the test harness. The Pi's OMX implementation may well be within OpenMAX's tolerances and, instead, my interpretation of the standard may simply be awry, or too strict. However, what I sought was clarification on the inconsistency in behaviour across components given the exact same program. Can my interpretation of the OpenMAX specification, on which I based the harness, be considered to be the same as the creators of the Pi's OMX implementation or not?

3) there is a single RPC channel to/from the VPU, and callbacks will be made from that context

This I first saw with my application, then verified using the test harness. My question was, given that it looks like the Pi OpenMAX implementation is blocking my calling thread while callbacks are being made from that different context, is this behaviour by design (i.e., within the bounds of a reasonable interpretation of the specification) or is it unintended?

Thanks in advance,

User avatar
dividuum
Posts: 228
Joined: Sun Jun 16, 2013 1:18 pm
Location: Germany
Contact: Website

Re: Inconsistencies/bugs in Pi OMX IL implementation cause Deadlock (test harness attached)

Sun Mar 08, 2020 10:02 pm

My approach with OMX has been to defer any work by pushing all required information retrieved by the callback into a (thread-safe) queue and handling that outside of the OMX callbacks context.
info-beamer hosted - A user and programmer friendly digital signage platform for the Pi: https://info-beamer.com/hosted

6by9
Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 9861
Joined: Wed Dec 04, 2013 11:27 am
Location: ZZ9 Plural Z Alpha, aka just outside Cambridge.

Re: Inconsistencies/bugs in Pi OMX IL implementation cause Deadlock (test harness attached)

Mon Mar 09, 2020 11:56 am

MarkDarcy: I'm sorry if you found my comments condescending - it was not intended.
The Pi IL hasn't been developed for at least 7 years, possibly 9. If you went to Microsoft with a test case would you expect them to fix something in Windows Media Center (discontinued May 2015)?
It is retained in the distribution in order to keep those existing programs working, but the expectation is that new projects shouldn't be using it. Maybe we could have signalled that better, but that's hindsight speaking.

Significant work is anything that can block on a resource held by any thread that has called into IL.
You have no knowledge whether the SendCommand completion will be generated before or after the callback notifying you that it has completed, therefore holding a mutex in the thread calling SendCommand that is required in your callback is going to cause issues.
As dividuum says, generally you want to signal a worker thread from the callback, and nothing else.
Software Engineer at Raspberry Pi Trading. Views expressed are still personal views.
I'm not interested in doing contracts for bespoke functionality - please don't ask.

MarkDarcy
Posts: 25
Joined: Thu Sep 20, 2018 8:23 am

Re: Inconsistencies/bugs in Pi OMX IL implementation cause Deadlock (test harness attached)

Fri Mar 13, 2020 6:39 am

Hi 6by9 and dividuum,

Thanks you both for your replies. I must first apologise for not replying sooner. In the meantime I've discovered a potentially much more serious problem which on the surface appeared related to this current query, so I just needed to investigate that before getting back to you on this. More on that other issue in a separate thread.

Thank you both for sharing your workarounds. I have from the start been using a separate "event forwarding" queue to a third independent thread as it was the only way to get a working solution at all. However, as I'm sure you also appreciate, passing command completion events via an extra queue can cause additional latency when the system is busy doing other CPU-bound tasks or higher-priority I/O-bound tasks. This has the effect of "significantly" stretching the whole tunnel setup time when compared to being able to directly signal completion from within the original event handler. It was good, though, to have independent verification from both of you regarding the workaround.

> 6by9

Thanks for your apology; it was very magnanimous of you. And, of course, your analogy regarding Microsoft is correct. In fact, I have considered switching to using MMAL on multiple occasions over the years. However, there as been virtually no improvement in the detail of the MMAL SDK documentation over the same period that the OMX layer has been neglected (so to speak). This makes it hard to justify the additional time that would be required to understand, through observation alone, the same level of operational detail with respect to MMAL that comes documented as standard for OMX through Khronos' OpenMAX specification. More importantly, as a consumer of an API, one can never be 100% certain about all facets of the API's behaviour through observation alone; a crucial point if one then has to give assurances regarding the reliability of software using that API. If more time could be invested into beefing-up the technical detail in the MMAL documentation (not just the APIs, the behavioural aspects as well) then I'm sure it would be of great mutual benefit to both users and Raspberry's engineers.

You mentioned that:
You have no knowledge whether the SendCommand completion will be generated before or after the callback notifying you that it has completed, therefore holding a mutex in the thread calling SendCommand that is required in your callback is going to cause issues.
Indeed, the first half of your statement is true, which is why when raising signals between threads the accepted wisdom is to use a combination of a mutex and a condition variable to ensure the signal can be raised and accepted atomically given any arbitrary interleaving of the threads involved. Unfortunately, this is where the second half of your statement does not, in my opinion, stand up to accepted wisdom. Whilst there may be a workaround, what is really at issue here is why the OMX library is blocking the thread calling SendCommand() while it launches/uses a second thread to issue the completion notification; a fundamental error when considering multithread programming paradigms. This serialisation of multiple threads by the OMX implementation is the direct cause of the resource contention that is in turn causing the deadlock (the resource being the mutex guarding the condition variable). Ironically, instead of serialising two threads in order to issue the notification, simply issuing the notification inline on the same thread that called SendCommand() would not only result in the deadlock being resolved (even given the exact same code), but would also result in improvements in command completion times and tunnel setup times overall.

I had established this much in my original post when I explained the signalling paradigm. My asking confirmation for Raspberry's interpretation of the specification is because I am trying to establish with what likelihood this issue might be addressed in the future. As indicated before, there is inconsistency with regards to which components cause deadlock and which don't. As the reduction of latency during tunnel setup is of primary importance to me, rather than the blanket application of the workaround to all components whether they require it or not, I am having to hard code (or maintain via some dynamically-configurable external means) which components have to use the event forwarding queue workaround and which don't.

From your previous response, I think I would be right in understanding that fixing any bugs with respect to existing OpenMAX functionality is not considered a high priority. Please understand, I have no qualms with having to use a workaround to patch tasking inconsistencies in API behaviour, providing such inconsistencies remain consistently inconsistent (!) over time. Conversely, it is potential unexpected changes to the inconsistent behaviour, including a well-intentioned fix performed in good faith, that was the original cause of my concern and the motivation for this query. Is there absolutely no plan to address issues such as the one I am experiencing, or is there some future possibility that such issues might be addressed? In this case, an absolute "no, we will not be addressing further issues" would be considered of far more value than either a "well, we don't know" or even a "we will but we don't know when".

Thanks in advance,

Return to “Graphics programming”