User avatar
sharpcoder
Posts: 39
Joined: Tue Mar 12, 2013 9:07 pm
Location: Seattle, WA
Contact: Website AOL

Task Scheduling with Interrupts

Tue Apr 07, 2015 12:32 am

Hello,

I'm writing a hobby os from scratch (using c++) and I'm struggling with task scheduling. I already have interrupts working properly and wired up to the arm timer. For scheduling tasks in my os, the plan is to use the following thread class:

Code: Select all

class Thread {
	public:
		uint32_t lr;
		uint32_t runPtr;
		bool isRun;
}
lr will contain the branch location for the thread
runPtr will contain the address of the function to run (uint32_t)( &myFunc )
isRun will contain whether the thread has been initialized or not

Conceptually, each time the timer interrupt is triggered I want to get the active thread. Update the lr pointer so it represents the last known execution location (which was interrupted). Enqueue the thread. Dequeue the next one. Snag the lr pointer that was stored for the newly acquired thread, and then branch to that location. This makes sense to me in theory, but where I'm struggling is with the assembly to achieve this. I have the following
vector.s

Code: Select all

arm_interrupt_handler:
;@ Store the return link.
sub r14, r14, #4

;@ Setup the arguments for our method.
mov r0, sp
mov r1, r14
	
;@ Invoke our C++ irq handler.
bl interrupt_vector
	
;@ The interrupt_vector method should have pushed the return address
;@ to the stack, so we'll pop it into PC.
pop {pc}
vector.cpp

Code: Select all

extern "C" void interrupt_vector(uint32_t sp, uint32_t lr) {
	// Clear the arm timer interrupt.
	RPI_GetArmTimer()->IRQClear = 0;

	// Switch tasks, passing in the stack pointer which has the information necessary to do stuff.
	// Pop a thread off the queue.
		
	// If the thread in question has never been run,
	// we execute the function pointer that the thread
	// points to instead of the last known lr pos because there is no last known lr pos.
	Thread* thread = threads.getAt(t_index);		
		
	if ( thread->isRun ) {
		thread->lr = lr;
		if ( t_index + 1 > threads.getLength() )
			t_index = 0;
		else
			t_index++;
	} else {
		// First run.
		thread->lr = thread->addr;
	}
	
	// Push the thread execution location to the stack
	asm volatile("PUSH {%0}" : : "p"(thread->lr));
	// Branch back
	asm volatile("bx lr");
}
A few things happen with this code. First, it seems like my interrupts break after the first run through. Secondly, it simply doesn't work. I've poured over the ARM assembly document, trying to wrap my head around around what I'm doing. But a shove in the right direction would be much appreciated.

Also, as a side note, I'm curious whether PUSH and POP work across stacks (as shown in my example)?
Thank you for your time.

jwatte
Posts: 203
Joined: Sat Aug 13, 2011 7:28 pm

Re: Task Scheduling with Interrupts

Tue Apr 07, 2015 3:10 am

I don't know the ARM interrupt handler semantics, but in most CPUs, you have to store the register state of the thread, in addition to the PC and stack pointer information, in the thread control block. This is also important when a non-interrupt causes a context switch, such as when blocking on a semaphore or blocking I/O.

PUSH and POP are just convenient short-hands for "store/load indirect to address and increment/decrement stack pointer."
Thus, if you change the value of the stack pointer register, the next PUSH/POP will act on whatever the new value of the stack pointer is. Speaking of which: You don't update the stack pointer in your context switch handler to whatever the stack pointer should be for the next thread.

Also, each thread needs its own stack. You need to allocate the stack for each thread, when you create the thread, and make that part of the thread control block, too. Right now, all the threads share the same stack (area of memory) which is unlikely to work right :-)

Finally: How are the globals you're using referenced? t_index is a global, as is threads. You need to make sure the compiler correctly references these globals while in your kernel, yet properly references the user-program globals when executing in thread context. Also, you are not wrapping a spinlock around your context switch routine. If you disable interrupts when you create new threads, that's OK for a single-core system, but on a Raspberry Pi 2 or other multi-core systems, just turning off interrupts on the currently running thread isn't good for mutual exclusion, because there are more cores that can also mutate global sate.

User avatar
sharpcoder
Posts: 39
Joined: Tue Mar 12, 2013 9:07 pm
Location: Seattle, WA
Contact: Website AOL

Re: Task Scheduling with Interrupts

Tue Apr 07, 2015 6:26 pm

jwatte,

Thanks for the great response! You've helped me out immensely. After much tinkering, I have what appears to be a working implementation of multi-threading, however, a few problems arise... Which I suspect will be resolved once I implement the suggestions you've made :)

First, is there anything special about allocating a new stack? Is there an ASM command to do this or can I just use my implementation of malloc? And when I save the registers in the thread control block, do I need to save all of them? (R1-R12) or just the first four?

My globals are defined as static, stack allocated objects (no "new" keyword). Except my list structure does use the heap a little. But the objects themselves are static and appear to be working. I'm worried that they'll be incorrectly referenced somehow though. Will just marking them as static suffice?

User avatar
eriktheitalian
Posts: 358
Joined: Thu Feb 19, 2015 1:03 pm

Re: Task Scheduling with Interrupts

Tue Apr 07, 2015 6:56 pm

U can examine bfs cpu scheduler. ( I'm not know is this good idea )
I cant using enough English language. My writings can be wrong grammer.$
"in micro$oft we not trust"

User avatar
sharpcoder
Posts: 39
Joined: Tue Mar 12, 2013 9:07 pm
Location: Seattle, WA
Contact: Website AOL

Re: Task Scheduling with Interrupts

Wed Apr 08, 2015 8:06 pm

I think I've figured out most of my problems and implemented most of your suggestions, jwatte :) Here is what my methods look like now.

vector.s

Code: Select all

;@ And here is the actual interrupt handler code.
arm_interrupt_handler:
	;@ Store the registers
	stm sp, {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12}
	;@ Move the return address and stack pointer into registers
	mov r0, lr
	mov r1, sp
	;@ Branch to our c++ handler
	bl interrupt_vector
	b hang
vectors.cpp

Code: Select all

extern "C" void interrupt_vector() {
	// Clear the arm timer interrupt.
	uint32_t ptr_sp;
	uint32_t ptr_lr;

	asm volatile("mov %0,r0\n\t"
				"mov %1,r1\n\t" 
				 : "=r"(ptr_lr), "=r"(ptr_sp) : : "r0", "r1", "memory");

	
	if ( threads.getLength() == 0 ) {
		return;
	}
	
	// Flip the LED each iteration, to show we're actively capturing
	// interrupts.
	RaspberryLib::SetGPIO(16, led_on);
	led_on = !led_on;

	// If the thread in question has never been run,
	// we skip this part essentially bootstrapping the
	// thread (since we just set the sp/lr variables for it).
	thread* t = threads.getAt(ind);
	t->sptr = ptr_sp;
	t->lr = ptr_lr;
	
	// Increment index.
	if ( ++ind >= threads.getLength() )
		ind = 0;
		
	// If this thread hasn't run yet, then execute the thread
	// we're on.
	if ( !t->isRun ) {
		
		// Update the ish.
		t->lr = t->addr;
		t->sptr = (uint32_t)alloc_stack(256);
		t->isRun = true;
		RPI_GetArmTimer()->IRQClear = 1;
		
		// We need to allocate a new stack.
		// Then we need to copy the registers over to it.
		asm volatile("mov r0,%0\n\t" // new stack pointer
					 "mov r1,%1\n\t" // old stack pointer
					 "mov r14,%2\n\t"
					 "mov sp,r0\n\t" // move the stack pointer
					 "ldm r1,{r0-r12}\n\t" // load the registers from the old location
					 "subs pc,r14,#4\n\t"
					 : : "r"(t->sptr), "r"(ptr_sp), "r"(t->addr) : "memory", "r0", "r1", "sp", "r14", "pc" );		
		
		return;
	} else {
		// Otherwise, get the next thread.
		if ( threads.getAt(ind)->isRun ) {
			t = threads.getAt(ind);
		}
	}
	
	RPI_GetArmTimer()->IRQClear = 1;
	asm volatile("mov r14,%0\n\t"
				 "mov r0,%1\n\t"
				 "mov sp,r0\n\t"
				 "ldm sp,{r0-r12}\n\t"
				 "subs pc,r14,#4\n\t": : "r"(t->lr), "r"(t->sptr) : "memory","sp", "r14", "pc", "r0" );
				 
	return;
}
And this actually seems to work. Some threads eventually die off, but I suspect that's because nothing is thread-safe at this point. I'm going to look into implementing the spin-lock and then integrating that in relevant places.

One question I did have, however, was what happens when a thread finishes the work? Right now I have no way of properly scheduling another thread when one completes. I was thinking of ways I can go about doing this and the best I've come up with is having some kind of "thread control method" which wraps the actual work call and, upon successful completion, knows how to clean up the thread control block. I'm not sure if that's the right approach though.

Thanks again for all the suggestions :)

mrvn
Posts: 58
Joined: Wed Jan 09, 2013 6:50 pm

Re: Task Scheduling with Interrupts

Sun Apr 19, 2015 11:35 am

While that might seem to work it is also completly broken (the C part). See below.
sharpcoder wrote:I think I've figured out most of my problems and implemented most of your suggestions, jwatte :) Here is what my methods look like now.

vector.s

Code: Select all

;@ And here is the actual interrupt handler code.
arm_interrupt_handler:
	;@ Store the registers
	stm sp, {r0,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11,r12}
	;@ Move the return address and stack pointer into registers
	mov r0, lr
	mov r1, sp
	;@ Branch to our c++ handler
	bl interrupt_vector
	b hang
You are missing a few things: First you forgot so save the user space registers. Probably because you haven't done any userspace yet or that is just one more thing that doesn't work for you. Secondly you also don't save the SPSR so on task switch you are leaking/breaking cpu mode and flags. If the switch happens between a cmp and branch then the branch might go the wrong way because you didn't preserve the flags. And last but not least you are not aligning the stack properly (only relevant when your kernel is reentrant). The interrupt may happen at a time the stack is not 8 byte aligned and then ldrd/strd will load the wrong data. Most likely place you notice that is in printf() printing the timers 64bit counter. But the compiler can use ldrd/strd any place and then you get really hard to trace bugs.
sharpcoder wrote: vectors.cpp

Code: Select all

extern "C" void interrupt_vector() {
	// Clear the arm timer interrupt.
	uint32_t ptr_sp;
	uint32_t ptr_lr;

	asm volatile("mov %0,r0\n\t"
				"mov %1,r1\n\t" 
				 : "=r"(ptr_lr), "=r"(ptr_sp) : : "r0", "r1", "memory");
R0 and R1 are the first 2 argument registers and are not preserved accross function calls. The compiler is free to do whatever it likes with them at any time in your function. For example right before your asm statement. That would corrupt the contents. Instead why not simply specify ptr_sp and ptr_lr as arguments to the function? The compiler then knows, from the C calling convention, that they are in R0 and R1.

In the future you might also want access to more than those two registers. For example syscalls will want access to the arguments. Or on faults you might want to print out all the register contents and a stack backtrace. What I do in my code is store all the registers (and SPSR) on the stack. I then pass the sp in R0 (the address of where all the registers are saved) and the interrupt number in R1 (in case you want to use the same C code for multiple faults). In C I define a struct Regs { ...} listing the registers in the order they appear on the stack and the C interrupt handler is declared as "void handler(struct Regs *regs, int num);" That way it has easy access to all the registers.
sharpcoder wrote:

Code: Select all

	// If the thread in question has never been run,
	// we skip this part essentially bootstrapping the
	// thread (since we just set the sp/lr variables for it).
	thread* t = threads.getAt(ind);
	t->sptr = ptr_sp;
	t->lr = ptr_lr;
	
	// Increment index.
	if ( ++ind >= threads.getLength() )
		ind = 0;
		
	// If this thread hasn't run yet, then execute the thread
	// we're on.
	if ( !t->isRun ) {
		
		// Update the ish.
		t->lr = t->addr;
		t->sptr = (uint32_t)alloc_stack(256);
		t->isRun = true;
		RPI_GetArmTimer()->IRQClear = 1;
		
		// We need to allocate a new stack.
		// Then we need to copy the registers over to it.
		asm volatile("mov r0,%0\n\t" // new stack pointer
					 "mov r1,%1\n\t" // old stack pointer
					 "mov r14,%2\n\t"
					 "mov sp,r0\n\t" // move the stack pointer
					 "ldm r1,{r0-r12}\n\t" // load the registers from the old location
					 "subs pc,r14,#4\n\t"
					 : : "r"(t->sptr), "r"(ptr_sp), "r"(t->addr) : "memory", "r0", "r1", "sp", "r14", "pc" );		
There is a little trick to make this simpler. When you create a new thread put the address of a thread_init() assembly function into ptr_lr. The function then sets up the new stack and registers and calls back to C. Allocating the stack there is an extra burden you don't need though. I would allocate it beforehand. That also means you can store extra arguments on the stack and have the thread_init() function get them from there (e.g. the initial register contents).
sharpcoder wrote:

Code: Select all

	RPI_GetArmTimer()->IRQClear = 1;
	asm volatile("mov r14,%0\n\t"
				 "mov r0,%1\n\t"
				 "mov sp,r0\n\t"
				 "ldm sp,{r0-r12}\n\t"
				 "subs pc,r14,#4\n\t": : "r"(t->lr), "r"(t->sptr) : "memory","sp", "r14", "pc", "r0" );
				 
	return;
}
And this actually seems to work. Some threads eventually die off, but I suspect that's because nothing is thread-safe at this point. I'm going to look into implementing the spin-lock and then integrating that in relevant places.
Seems being the operative word. You are playing loose with the compiler again and this will certainly not work with user mode processes. If you went right back to user mode like this then the kernel stack pointer would remain where it is and each interrupt/syscall would add to the stack till it overflows.
You need to switch stacks and then return normaly from the scheduling function. That also means you can not do the stack/processor switch with inline asm.

Simplest approach is to have a assembly function "extern void switch_task(void **old_stack, void *new_stack);" This function must save the caller saved registers on the stack, save the SP in *old_stack, set the SP to new_stack and restore the caller saved register before returning. In your scheduler you then call that small assembly function to switch the stacks.

While this will look like you switch stacks and return to the scheduler what actually happens is that you store the current tasks context, switch tasks and return to where the new task left of. Only later when you switch back to the original task will it return to that instant of the scheduler. The new task will return to its own instance of the scheduler unless it is a new task (then it returns to the thread_init function). But it important to realize those are different instances of the scheduler being called.
sharpcoder wrote: One question I did have, however, was what happens when a thread finishes the work? Right now I have no way of properly scheduling another thread when one completes. I was thinking of ways I can go about doing this and the best I've come up with is having some kind of "thread control method" which wraps the actual work call and, upon successful completion, knows how to clean up the thread control block. I'm not sure if that's the right approach though.

Thanks again for all the suggestions :)
When you start off each task with the thread_init function this is simple. The thread_init function will call some thread_start() function after the thread is properly initialized. When that returns it can de-initialize the thread. It can free the TLS memory of the thread and even free the stack. In case of the last thread of a process it will close any filedescriptors and other process resoruces. What you normaly can not do is free the actual thread/process structure itself because the scheduler will still access it to switch to another task.

The usual solution to this is simple and assumes each task has a state like RUNNING, SLEEPING, BLOCKED. You add another state DEAD and flag the task as such. Then in the scheduler, after switching tasks, you check if the last task was DEAD. If so you can savely remove it and free it. You can also check this before switching instead. That would free resources a little bit later but is sometimes easier.

Hope that helped, Mrvn

PS: See https://github.com/mrvn/RaspberryPi-bar ... al/entry.S for a full example of storing state for an exception.

Return to “Bare metal, Assembly language”