caspart
Posts: 23
Joined: Wed Jan 29, 2020 7:07 pm

System freezes - howto diagnose

Wed May 06, 2020 2:02 pm

Hi,
I am using the xRTOS solution by LdB for my experiments (I'm a student). I have extended the system with running benchmarks and measuring the number of cycles and some events. But unfortunately, I am experiencing system freezes now and then. Sometimes they seem to happen randomly and for some cases they happen often. I have noticed that often (not sure if always) the program keeps stuck here in this while loop:

Code: Select all

/*-[pl011_uart_putc]--------------------------------------------------------}
. Send a character out via uart.
.--------------------------------------------------------------------------*/
void pl011_uart_putc (char c)
{
	while (PL011UART->FR.TXFF != 0) {};								// Check tx fifo is not full
	PL011UART->DR.DATA = c;											// Transfer character
}

I have read the thread called
Bare metal UART w/FIQ stops working
and I'd like to diagnose my problem. But the code is different and I don't see how/if the UART is linked to the FIQ in my case. (If related to my problem at all.)

I managed to connect to the running image with a JLink probe and OpenOCD. When I use OpenOCD on the terminal, I can inspect registers, select the core and halt it, etc. But ideally I'd like to debug with Eclipse. So I made a debug configuration for this, now I can also connect to the Pi from Eclipse but unfortunately I get the message
Break at address “0x872a4” with no debug information available, or outside of program code
I did compile with -g for debug symbols, and in Eclipse I specified the ELF binary as the executable.

Any ideas how to get this working? Thanks!

LdB
Posts: 1595
Joined: Wed Dec 07, 2016 2:29 pm

Re: System freezes - howto diagnose

Wed May 06, 2020 2:51 pm

As xRTOS will run on every Pi model you need to provide more info.
You will need to give Pi model and if a multicore model are you running single core or multicore?

Now remember you are on a pre-emptive task switcher so the most significant question to answer is does every task jam or just the UART task.
So if it is just the UART task I can guide you how to debug but if everything freezes then it's likely a race condition to the context switch and I will happily look at what is going wrong just throw up some code on git or fileshare somewhere.
If you don't have another visible task then make one a simple LED flash heartbeat will do, it makes deciding what to debug a lot clearer

Code: Select all

void HEART_BEAT (void *pParam) 
{
	while (1) 
	{
         	// code to turn LED on
		xTaskDelay(1000);
        	// code to turn LED off
        	xTaskDelay(1000);
	}
}

I would also strongly suggest another thing to do is create some code to redirect any core exception. If you look at the vector table all cores if they raise an exception beside the fiq, irq and swi will goto hang which is defined as

Code: Select all

hang:
	b hang

So basically the entire core will just hang. It is easy enough to simply write some c code to send a fixed message to UART and then hang, so you can see first what the core exception and even what address it occurred at. You just add the jump in vector table to your c code function name rather than fixed internal hang.

caspart
Posts: 23
Joined: Wed Jan 29, 2020 7:07 pm

Re: System freezes - howto diagnose

Wed May 06, 2020 8:37 pm

Many thanks for your quick response. Ah yes, I forgot to mention, I'm using a multicore application up to 4 cores active. But for some benchmarks I only use 1 core, and still I sometimes see the system hanging. My Raspberry P is a 3 model B rev 1.2.

From the outputs I see, I suspect it's one core that's hanging. But I'll investigate this and let you know. And I'll try the redirection of the exception.

caspart
Posts: 23
Joined: Wed Jan 29, 2020 7:07 pm

Re: System freezes - howto diagnose

Sun May 10, 2020 2:25 pm

I have implemented the heartbeat LEDs, so cores that aren't involved in the benchmarking just flash the LED. This proves that it's just the cores that do the benchmarks are hanging, cause I see the LEDs flashing!

Btw I didn't understand the

Code: Select all

set_Activity_LED
function (needs GPIO extension header from what I see in the sources?) so I've used the

Code: Select all

gpio_setup + gpio_output
fucntions. That also works.

Haven't yet implemented the exception function, I'll try that next.

LdB
Posts: 1595
Joined: Wed Dec 07, 2016 2:29 pm

Re: System freezes - howto diagnose

Mon May 11, 2020 6:54 pm

set_Activity_LED just turns the green onboard LED on/off.

caspart
Posts: 23
Joined: Wed Jan 29, 2020 7:07 pm

Re: System freezes - howto diagnose

Wed May 13, 2020 4:20 pm

I have added debug symbols to the executable image and I'm able to see where the cores are running instructions when the system 'freezes'. It's not actually a freeze because at least some of the cores keep running (as proven by the flashing LED).

It appears that I have two types of problems:
1) sometimes cores stop running at the first usage of the UART. This is in the pl011_uart_putc function, it will indefinitely wait for the PL011UART->FR.TXFF to be 0.
See the attached screenshot of a GDB dashboard debug session, with stacktrace. I believe the character to be sent is the second one of the string.
pl011_uart_putc.png
pl011_uart_putc.png (161.57 KiB) Viewed 699 times
2) at other times (less frequently), after running many (100s) of iterations of running (same) benchmarks, one core stops running. Shortly after that, another core will also stop running. In the debugger, I see different instructions on different occasions, sometimes it seems to keep running in the IDLE task, sometimes it will remain in the swi_handler_stub.

Btw I have attempted to write extra handlers in the VectorTable but they didn't seem to be activated.

@ LdB: it would be great if you could look at my code at
https://github.com/cassebas/Raspberry-Pi-Multicore
I have created a special branch called experiment_debug to reproduce the (unfortunately random) error.
Please let me know if I should provide more information. And please note that for my experiments I have made slight modifications to the scheduler and have increased stack sizes in an attempt to attack the bug. And finally please note that I have modified the Makefile for compilation under Linux.
Thnx again!

caspart
Posts: 23
Joined: Wed Jan 29, 2020 7:07 pm

Re: System freezes - howto diagnose

Wed May 27, 2020 9:44 am

It would be great if someone is willing to help me diagnose the problem. I know it may be difficult to get familiar with my code, but maybe someone can provide some hints as to how to use the gdb debugger effectively to solve the bug. I am able to halt the core and see where it was executing, but most of the time it will be in the swi_handler, but I don't know why.
Thnx

okenido
Posts: 73
Joined: Thu Aug 02, 2018 11:47 am

Re: System freezes - howto diagnose

Thu Jun 11, 2020 11:19 am

I don't know specifically for your case but in my bare metal apps I had random crashes with IRQ/FIQ when I didnt saved/restored floating point registers in the interrupt handler (depends if you use these...).

What happens if you add some timeout to your while loop ? Does the program keeps running ?

LdB
Posts: 1595
Joined: Wed Dec 07, 2016 2:29 pm

Re: System freezes - howto diagnose

Thu Jun 11, 2020 12:06 pm

If it's stuck in the SWI that is an immediate yield so sounds like the scheduler is jamming rather than any register corruption which will throw an exception. I will have a look tomorrow but you might want to keep an eye on each cores current task status. If there is a situation the current task is not in the ready state the scheduler will just keep spinning wheels trying to select another task which is what it sounds like is happening.

Return to “Bare metal, Assembly language”