tom66
Posts: 18
Joined: Tue Jan 10, 2012 12:14 am

GPU lockup with some 3D apps - "[drm] Resetting GPU" - not always recoverable

Sun May 17, 2020 12:24 pm

So, I've been experiencing an issue getting our Raspberry Pi CM3 to work well.

We are using a CM3 in a custom board. The CM3 is powered with 5V, 3.3V and 1.8V per specifications: these are all stable and there is no excessive ripple on any rail. The Pi CM3 is not getting hot (vcgencmd reports ~60C at peak.)

I am able to get the issue to occur in both Full and Fake KMS. It's worth noting that it occurs at different resolutions. Due to a PCB design error, the polarity of the HPD_DETECT pin is reversed, but altering the *.dts file used to create the *.dtb has allowed Full KMS to identify the monitor correctly. Again, I don't think this is the cause of the issue because 2D graphics works fine and 3D works fine for some time, but for full disclosure I'm publishing it.

Regardless, I'm also not convinced this is a CM3-specific issue, as the issue appears to be contained entirely within the GPU. After some time running 3D software (for instance "Neverball" is a good test game as it is not horribly demanding of the Pi's GPU) various graphical glitches and lockups can occur. This can take as little as 40-50 seconds in some applications, or with our test application can happen immediately. When the lockup occurs:

- the GUI becomes mostly unresponsive (the mouse still works); this is the whole X-server and desktop interface, not just the application under test
- graphical glitches are present in the form of black tiles or tiles from the 3D application or elsewhere on the desktop
- in some cases, an alternating red/blue "shading" effect is present (blue becomes red and vice versa)
- SSH and CPU appear to work OK; I can use the Pi as a remote terminal and performance seems ~similar to before the lockup
- little/no CPU usage, little/no SD card usage
- plenty of GPU RAM free (>190MB reported under CmaFree)
- plenty of CPU RAM free (>400MB)
- dmesg is full of "[drm] Resetting GPU", usually repeated every 2s.

I cannot get vcdbg log to work, I get an error relating to the allocation of a negative number of bytes failing. I am not sure if this is a bug or if it is unsupported in the current release or configuration. I've tried running it with and without root.

If I kill the offending app, the desktop usually starts responding, but it takes up to 15 minutes. In at least one case, I left it overnight and it never came back.

I can take screenshots with scrot, but they are slow to capture compared to normal, as if the GPU is unable to keep up with requests.

Kernel version:

Code: Select all

Linux raspberrypi 4.19.97-v7+ #1294 SMP Thu Jan 30 13:15:58 GMT 2020 armv7l GNU/Linux
Per the other post I made yesterday with an attempt to get AXI performance counters, that is now working, but there doesn't seem to be any enormous read/write request increase; I had previously hypothesised that there was some kind of lockup where the GPU needed memory but couldn't get the resources due to other pending requests, but there didn't seem to be any noticeable change in the counters. With our custom GL app, we've noticed the problem most occurs when we ask the GL driver to do too much. For instance, rendering 8K points instead of 4K points. 8K causes a lockup, but 4K will manage ~11fps.

I have noticed the issues appear *more* common with Fake KMS, but that Full KMS still causes them.

Running Raspbian Buster Feb 2020.

Screenshots here: https://imgur.com/a/OaoA5L5

Any help or suggestions appreciated -- if any more debug/diagnostics are required I'm happy to provide them.

tom66
Posts: 18
Joined: Tue Jan 10, 2012 12:14 am

Re: GPU lockup with some 3D apps - "[drm] Resetting GPU" - not always recoverable

Thu May 21, 2020 4:05 pm

Does anyone have any further suggestions?

Return to “Advanced users”