LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

bzt's debugger

Sat Aug 04, 2018 11:16 am

Hello, so I have tried to integrate bzt's debugger with the disassembler and so far I have been unable to even trigger an exception. I have set the vector table up the same as bzt did and I set the system register to hold the address of the vector table but it isn't even being triggered. Here is my GitHub repo https://github.com/OllieLollie1/Raspi3-Kernel. From what I understand calling brk #0 should trigger a synchronous exception with the cause a breakpoint instruction. So when the exception is generated it should turn the ACT LED on it works elsewhere in the code such as when setting x0, =vectors. If anyone can help please do as I have spent a good ~3 days on this one problem.

bzt
Posts: 181
Joined: Sat Oct 14, 2017 9:57 pm

Re: bzt's debugger

Sat Aug 04, 2018 5:11 pm

Hi,

Yes, you're right, "brk #0" should trigger a synchronous exception. And yes, you set up vbar correctly (at least I can't see any problem with it). Now about the vectors code, you should first store x30 on the stack then call saveregs, because set_ACT_LED will modify the registers, and your code therefore saves the modified state this way. But other than that I can't see any problems, it should work. Are you sure "breakpoint;" in main.c is reached at all? What if you put that right after uart_init()?
Also I'd suggest to run qemu with "-d int,in_asm". Can you confirm that all instructions called properly and exists in the output (I mean setting vbar, EL1 switch, and brk too)? You should redirect the qemu output to a file so that you can search for the instructions.

Cheers,
bzt

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Sat Aug 04, 2018 10:35 pm

OK, so I have found the issue but I have no idea how to correct it. I have found that the address getting loaded into vbar is incorrect. It gets the address of _vectors but then adds 0x198 to it which is actually my C code. It does reach the break point and when it is triggered the wrong address is jumped to I have uploaded the information to my GitHub repo it is in two files qemu-output.info and objdump.info. Simmilar happens with the actual tutorial but it doesn't have an error. Also after it executes 4 instructions it jumps to a spin lock as if main had returned. Sorry about the structure of qemu-output.info the part where the exception is triggered is at the top of the file and just below that I run it again and get the setup code. I just could't redirect the output of qemu to a file and my terminal doesn't have a particularly large scrollback.

bzt
Posts: 181
Joined: Sat Oct 14, 2017 9:57 pm

Re: bzt's debugger

Mon Aug 06, 2018 9:42 am

LizardLad_1 wrote:
Sat Aug 04, 2018 10:35 pm
OK, so I have found the issue but I have no idea how to correct it. I have found that the address getting loaded into vbar is incorrect. It gets the address of _vectors but then adds 0x198 to it which is actually my C code. It does reach the break point and when it is triggered the wrong address is jumped to I have uploaded the information to my GitHub repo it is in two files qemu-output.info and objdump.info. Simmilar happens with the actual tutorial but it doesn't have an error. Also after it executes 4 instructions it jumps to a spin lock as if main had returned. Sorry about the structure of qemu-output.info the part where the exception is triggered is at the top of the file and just below that I run it again and get the setup code. I just could't redirect the output of qemu to a file and my terminal doesn't have a particularly large scrollback.
Since the vbar address must be properly aligned, there's a good chance that the lower bits are used by the CPU for flags. Read the doc, I'm sure it's in there somewhere. I wouldn't worry about it, specially since it happens in the original tutorial as well, which works.

Try to minimize your code (comment out as much as possible) until your break point is triggered properly. Then put code back in small portions, one part at a time. When you reach the point where your break point won't be triggered again, you'll know that the last code you've put in causes the trouble, so it's enough to debug that part only.

About redirecting qemu output, try to add "2>debug.txt" to the end of the qemu command line.

Cheers,
bzt

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Mon Aug 06, 2018 11:02 am

I haven't yet been able to solve it however I know that the breakpoint is being executed. It just isn't jumping to the right position in the code.

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Wed Aug 15, 2018 6:47 pm

Problem seems basic your register saving routine is completely wrong and I can't find any restore code.

Right at the start you store a single 8byte register and push the stack down 16 bytes.

Code: Select all

	str     x30, [sp, #-16]!     // push x30
You are correct in saving the link register which will get corrupted by the BL instructions.
However I cant see where you ever reset the register or stack back the 16 bytes .. so I am guessing the stack drops down and down????

Your activity LED interrupt is just plain wrong you immediately corrupt the x0 register and then call set_ACT_LED corrupting the x30 register and whatever other registers in the code before you return back to then save x30 and save register data.

Your dbg_saveregs call has many problems and weird alignment issues.
I also suggest you review stack use in AARCH64 especially about the 16 byte alignment problem
https://community.arm.com/processors/b/ ... sh-and-pop?

I cant find where you ever restore any of the registers before you do an eret .... do you have any register restore code and I missed it?

From what I can see your interrupt routine will corrupt numerous registers and crash and burn.

So basically when handling the interrupt you need to do 5 steps

1.) Save any register that can get corrupted by your interrupt handler code
2.) Execute your interrupt handler
3.) Clear the interrupt source
4.) Restore the registers so they are back exactly as they entered
5.) eret to end the interrupt

If you look at my interrupt handler stub ... go down to irq_handler_stub
https://github.com/LdB-ECM/Raspberry-Pi ... tStart64.S
I do what is called a lazy save, just methodically saving every register before calling the interrupt code.
Really given the call is to c code I really should look at what registers C can corrupt and save only those :-)

If you look at the arm interrupt detail they talk about saving only corruptible registers
http://infocenter.arm.com/help/index.js ... 10s05.html

What I expect your handler code to look like

Code: Select all

	.balign  0x80
	str     x30, [sp, #-16]!     // push x30 and push stack down 16
	bl      dbg_saveregs        // save all the registers   (step1)
	mov     x0, #1
	bl      dbg_decodeexc      // Step2
	bl      dbg_main            // Step2 + Step3
	bl      dbg_restoreregs    // restore all the registers (step4)
	ldr	x30, [sp], #16       // pop back register x30 and drag stack back up 16
	eret                        // step5 

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Thu Aug 16, 2018 2:39 am

Okay try this patch
https://github.com/LdB-ECM/Exchange/blo ... ad/start.s

I had issues with your code first the main.c sample doesn't really do anything but worse your code isn't very portable.

Just so you know unsigned int is not guaranteed to be 32bits on AARCH64 except on C compilers that use the LP64 data model.
http://shervinemami.info/arm64bit.html
char: 8-bit unsigned.
bool / _Bool: 8-bit unsigned. False is 0 and True is 1.
long long: 64-bit signed.
int, long, pointer: might be 32-bit or 64-bit, depending on IPL32/LP64/LLP64 shown below.
You have lots of hardware registers and mailbox structures that use unsigned int's and none of it worked for my compilers because I have unsigned int as 64-bit. I strongly advise you start using uint32_t it makes it far more portable to other compilers.

Anyhow I setup a compiler with 32 bit unsigned int's and everything seems okay but I have no way to test my changes as you code doesn't do anything I can see.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Thu Aug 16, 2018 2:58 am

The code once the break point is triggered it should enable the ACT LED. I will try the patches when I get home and thank you I was unaware that different compilers didn't use 32 bits for unsigned ints. I will change all that later tonight.

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Thu Aug 16, 2018 3:15 am

I removed the double push there was a problem, I did some stack maths which hopefully right .. update done .. new code there

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Thu Aug 16, 2018 9:29 am

I have completed changing all types except chars to *int*_t types so it should be more portable. When brk #0 is triggered it still jumps to the wrong address. It jumps past the vector table and into some c code that is completely irrelevant to exception handling. A return is then encountered and that is interpreted as a return from main. Then after the return it gets stuck in the wfe loop. This may be an alignment issue however I'm not sure can anyone compile the code and have it execute successfully. The code is designed to excecute the breakpoint and then in the vector table enable the ACT LED.

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Thu Aug 16, 2018 3:48 pm

The problem isn't an alignment issue your table isn't right .. your in EL1_SP mode look at your start code

Lets show you the table done in macro form its more obvious the EL1_SP vectors are number 5-8 in the table not 1-4 you haven't populated 5-8 they be blank in your code

Code: Select all

.balign 0x800
.globl	VectorTable
VectorTable:
	/* from current EL with sp_el0 */
	vector	hang			/* Synchronous */
	vector    hang		       /* IRQ */
	vector	hang			/* FIQ */
	vector	hang			/* SErrorStub */
	/* from current EL with sp_elx, x != 0 */
	vector	hang			 /* Synchronous */
	vector	irq_handler_stub   /* IRQ */
	vector	hang			 /* FIQ */
	vector	hang			 /* SErrorStub */
	/* from lower EL, target EL minus 1 is AArch64 */
	vector	hang			/* Synchronous */
	vector  hang			        /* IRQ */
	vector	hang			/* FIQ */
	vector	hang			/* SErrorStub */
	/* from lower EL, target EL minus 1 is AArch32 */
	vector	hang			/* Synchronous */
	vector    hang			/* IRQ */
	vector	hang			/* FIQ */
	vector	hang			/* SErrorStub */

Try something like

Code: Select all

        /* from lower EL, target EL minus 1 is AArch64 */
	.balign 0x800
_vectors:
	.balign  0x80
	b _start

	.balign  0x80
	b _start

	.balign  0x80
	b _start

	.balign  0x80
	b _start

        /* from current EL with sp_elx, x != 0 */
	// synchronous
	.balign  0x80
	stp	x29, x30, [sp, #-16]!	 // Save x30 link register and x29 just so we dont waste space
	bl		register_save		 // Save corruptible registers .. it assumes x29,x30 saved
	
	.... blah blah blah
	
Update: Okay I added that and your code now generates two lines on the bottom of screen
Synchronous: BreakPoint instruction
> nchronous: Breakpoint instruction
Interrupt is being processed but looks like you have bugs.
Update1: Found one you save 38 registers not 37 (32+6) try

Code: Select all

unsigned long dbg_regs[38];

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Fri Aug 17, 2018 8:37 am

The bug is an error in my implementation of a terminal. I forgot to clear the screen when the screen rolled up.

EDIT:
I found I need to call this mailbox command but I'm actually still unsure about using the mailbox.
Here is the mailbox command I need to call: https://github.com/raspberrypi/document ... ank-screen

bzt
Posts: 181
Joined: Sat Oct 14, 2017 9:57 pm

Re: bzt's debugger

Fri Aug 17, 2018 11:17 am

Hi,

@LdB: you're right, my debugger was deliberately not designed to be continuable (that's something I left for the reader to do). On this note you can't always return from an exception handler (think about prefetch abort for example). The only purpose of dbg_regs[] therefore is to store the registers so that the debugger can print them out. This includes x30 too, and in my original tutorial it is the dbg_saveregs that pops x30 from the stack (line 147 of start.S). The purpose of juggling with x0 and x1 at the beginning of the function guarantees that no registers are clobbered during the save (lines 113-118). Maybe not the best or most elegant solution, but works perfectly. This way by the time dbg_saveregs returns, all registers are saved into dbg_regs[] and stack is clean.

@LizardLad_1: listen to LdB, he is right the vector code needs to be modified if you want to return properly from the debugger (if that's even possible. You should check if the exception was caused by the "brk" instruction by examining *dbg_regs[31].)

Cheers,
bzt

bzt
Posts: 181
Joined: Sat Oct 14, 2017 9:57 pm

Re: bzt's debugger

Fri Aug 17, 2018 12:02 pm

LizardLad_1 wrote:
Fri Aug 17, 2018 8:37 am
The bug is an error in my implementation of a terminal. I forgot to clear the screen when the screen rolled up.

EDIT:
I found I need to call this mailbox command but I'm actually still unsure about using the mailbox.
Here is the mailbox command I need to call: https://github.com/raspberrypi/document ... ank-screen
Back in the old CRT days, blanking the screen meant to power off the electron-cannon to preserve the fluoresent layer on the display. Without any electrons fired, the screen become black and blank, but the framebuffer was not affected in any way. These days there are no electron-cannons in the displays, so I assume this blanking effect is implemented by filling up the framebuffer with zeros in case proper display-side power-management is not available (for example when you connect a HDMI2VGA adapter cable instead of a real HDMI display). As a sideeffect this mailbox call is perfect to clear the screen.

About using the mailbox in general: think about it as a special way of calling a built-in function provided by the firmware. Same as with BIOS or UEFI. In the first case the ABI was software interrupts, for the latter that's uefi_call_wrapper. In case of VideoCore, the call is implemented by a properly aligned and filled up array passed in a MMIO register. Regardless to the ABI, all provides you a standard API library to interact with the board (with more or less covering the same functionality, for example all three has a function for clearing the screen).
If you want to do something hardware related, you have to use the MMIO interface. Because the mailbox interface is built on top of that, therefore the VC seems as just another memory mapped peripheral. Does that make any sense to you?

Cheers,
bzt

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Fri Aug 17, 2018 2:41 pm

Walk down memory lane .. I get what you were doing now.

An interesting question I wondered, the CortexA53 has a whole debugging interface built in but it is up to vendor implementation if they expose it. Do we know if it's implemented on the Pi? I am guessing it probably isn't because they took the GIC out and replaced with the Pi interrupt controller and I could imagine that would foul everything up.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Sat Aug 18, 2018 1:18 am

I tried to implement the mailbox call and I have been unsuccessful. If it is possible don't provide me with the answer but tell me what is incorrect please.

Code: Select all

void lfb_clear()
{
        mailbox[0] = 8*4;
        mailbox[7] = MBOX_TAG_LAST;
        mailbox[1] = MBOX_REQUEST;
        mailbox[2] = 0x00040002;
        mailbox[3] = 8;
        mailbox[4] = 0x0;
        mailbox[5] = 0x1;
        mailbox[6] = 0x0;
        mailbox_call(MBOX_CH_PROP);
}

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Sat Aug 18, 2018 1:55 am

Unfortunately most of it, I pretty much have to give you the answer :-)

Look at tag 0x00040002 details again and run thru how it works

Code: Select all

Blank screen
    Tag: 0x00040002
    Request:
        Length: 4
        Value:
            u32: state
        State:
        Bit 0: 0=off, 1=on
        Bits 1-31: reserved for future use (set to 0)
    Response:
        Length: 4
        Value:
            u32: state
        State:
        Bit 0: 0=off, 1=on
        Bits 1-31: reserved for future use
First for now we skip the first 2 entries they are always the header .. so entries 0,1

Then we start with the message body that comes from tag details
mailbox[2] is the tag id .. you got that one
the request length is 4 ... so mailbox[3] needs to be 4
the response length is 4 ... so mailbox[4] needs to be 4
You need to provide on/off in mailbox[5] .. 0 or 1 (that's the 4 bytes above you told it you would provide as request length)
It will also respond back in mailbox[5] (that's the 4 byte buffer you told it that it could use as response length)

Now you deal with the end sequence ... no more tags means mailbox[6] = 0

Now you go back and deal with the header being the first 2 entries
tag end is mailbox[6] so there are 7 entries (0,1,2....6) ... so size of structure you need to send in mailbox[0] = 7 * 4;
mailbox[1] is always set to zero ... the VC will set the value to 0x80000000 on success

Make sure mailbox array is 16 byte aligned and you are good to go.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Sat Aug 18, 2018 4:22 am

Thanks for the info! Sadly it still isn't working so either it does something different to what I thought it did or I am still wrong.

Just a side note does AARCH64 have any atomic operations like test and set for semaphore implementation?

UPDATE:
I can confirm that the mailbox call succeeds so it just mustn't do what I thought it did.
If anyone can think of a faster way to clear the screen please post how. bzt I haven't ever used a CRT TV so I didn't know this applied to that.

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Sat Aug 18, 2018 5:20 am

The absolute fastest way to clear the screen is setup a blank VC4 pipeline render scene and get the VC to do all the work ... it does about 300 frames a second with a blank scene :-)

ARM has numerous papers on synchronization primitives just hit the search

You must have the MMU online otherwise the exclusive aquire (LDREX etc) will simply lockup.

I have basic semaphore inc/dec implementation test ... it holds 3 cores for 10 seconds etc to prove the semaphore lock is working.
The precompiled image file (kernel8.img) is there if you just want to see it.
https://github.com/LdB-ECM/Raspberry-Pi ... tualmemory

The semaphore inc/dec is in SmartStart64.S
https://github.com/LdB-ECM/Raspberry-Pi ... tStart64.S
Last edited by LdB on Sat Aug 18, 2018 5:32 am, edited 1 time in total.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Sat Aug 18, 2018 5:32 am

Is there anyway to do it without the MMU meddling with memory?

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Sat Aug 18, 2018 5:47 am

In multicore you would have to go the core exchange message route like an AMP/BMP.
You get the basic requirement one core has to be able to hold other cores off a shared resource.

What is the issue with the MMU it's pretty simple to setup a basic mapping.

If you need a basic sample with MMU and tasks on the AARCH64, I have one I can put up.
It was my mess around I made for the 64bit Xinu port which I will hopefully finish this weekend.
I won't claim it's the best but it will get you going.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Sat Aug 18, 2018 7:35 am

I currently have everything working without the MMU and I just assumed it would mess everything up if I were to turn it on. If it isn't too much trouble would you please put your example up.

LdB
Posts: 877
Joined: Wed Dec 07, 2016 2:29 pm

Re: bzt's debugger

Sat Aug 18, 2018 8:55 am

The MMU doesn't change anything at all if you map it 1:1, your current code would run unchanged.

Let me alter your start.S to put a simple 1:1 map on and turn on the MMU for you as a start point.
If you can update your git source so I can just check it .. would be useful.
I will also throw up a simple 4 core task switcher sample for you.

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Sat Aug 18, 2018 10:21 am

Thank you sooo much I will update my source when I get home. I will get that done in a few hours.

UPDATE:
Updated source code on GitHub

LizardLad_1
Posts: 126
Joined: Sat Jan 13, 2018 12:29 am

Re: bzt's debugger

Mon Aug 20, 2018 8:03 am

Just as a side note LdB how do you decide how large your stack will be in your linker file?

EDIT:
I have changed start.s again to support all 4 cores and set up a stack for all execution levels and to branch the different cores to their respective pieces of code to execute. I have already been noticing bugs without semaphores because all cores are trying to print to the console at the same time.

Return to “Bare metal, Assembly language”