franksai
Posts: 6
Joined: Thu Jun 08, 2017 11:07 am

Modify pthread library for pi2 bare-metal with LDREX

Fri Jul 28, 2017 5:26 am

----Describe----
I'm trying to run Splash2 benchmark in pi2 bare metal.
So I modify pthread library for pi2.
In the implementation if MUTEX_LOCK(...), i use instruction LDREX then the function fail.
If I use LDR to replace LDREX the function works in situation of single core.
when comes to multi-core, it sometimes fail.
----Question----
How to make LDREX works?
Is there anything should do for local/global monitor?

I try to open MMU and set pthread memory session to shareable in bootcode.
But not sure if MMU works correctly.
The following test is to do a special memory mapping.
the result is

Code: Select all

>before 0c000000 : 45515555
>after 0c000000 : 00000666
>_before 0a000000 : aaaaabaa
>_after 0a000000 : 00000666
---Code---
bootcode.s

Code: Select all

_start_MMU:
//-------------------------------------------------------------------
// Cortex-A7 MMU Configuration
// Set translation table base
//-------------------------------------------------------------------
    // Cortex-A7 supports two translation tables
    // Configure translation table base (TTB) control register cp15,c2
    // to a value of all zeros, indicates we are using TTB register 0.
    MOV     r0,#0x0
    MCR     p15, 0, r0, c2, c0, 2
    // write the address of our page table base to TTB register 0
    LDR     r0,=TTB_BASE
    MOV     r1, #0x08                   // RGN=b01  (outer cacheable write-back cached, write allocate)
                                        // S=0      (translation table walk to non-shared memory)
    ORR     r1,r1,#0x40                 // IRGN=b01 (inner cacheability for the translation table walk is Write-back Write-allocate)
    ORR     r0,r0,r1                    
    MCR     p15, 0, r0, c2, c0, 0     
//-------------------------------------------------------------------
// PAGE TABLE generation 
//wei: ref to MPCore mamual 5-3
//-------------------------------------------------------------------
    LDR     r0,=TTB_BASE
    LDR     r1,=0xfff                   // loop counter
    LDR     r2,=0b00000000000000000000110111100010 //wei: tex-000 BC-00 -> shareable & strongly_order   
//	LDR     r2,=0b00000000000000000001110111100010 
init_ttb_1://10_0000 -->fff0_0000
    ORR     r3, r2, r1, LSL#20          // R3 now contains full level1 descriptor to write
    ORR     r3, r3, #0b0000000010000    // Set XN bit
    STR     r3, [r0, r1, LSL#2]         // Str table entry at TTB base + loopcount*4
    SUBS    r1, r1, #1                  // Decrement loop counter
    BPL     init_ttb_1
   
 //0x00000000
    LDR     r1,=0x00000000              // Base physical address of code segment
    LSR     r1, #20                     // Shift right to align to 1MB boundaries
    ORR     r3, r2, r1, LSL#20          // Setup the initial level1 descriptor again
    ORR     r3, r3, #0b0000000001100    // Set CB bits
    ORR     r3, r3, #0b1000000000000    // Set TEX bit 12 //wei: tex-001 BC-11 -> S bit & normal
    STR     r3, [r0, r1, LSL#2]         // str table entry

//0x0700_0000~ 0x0800_0000 is for pthread used
    LDR     r4,=0x080                   // loop region
    LDR     r1,=0x010              	// loop counter 
init_pthread:
    ORR     r3, r2, r4, LSL#20          // R3 now contains full level1 descriptor to write
    ORR     r3, r3, #0b0000000000000    // Set CB:00
    ORR     r3, r3, #0b1000000000000    //set  TEX bit 12  
    ORR     r3, r3, #0b10000000000000000// set S  bit 16  //wei: tex-001 BC-00 -> S bit & normal
    STR     r3, [r0, r4, LSL#2]
    SUBS    r4, r4, #1         		// 
    SUBS    r1, r1, #1                  // Decrement loop counter
    BPL     init_pthread

    //0x3f000000 DEVICE
    LDR     r1,=0x3f000000              // Base physical address of code segment
    LSR     r1, #20                     // Shift right to align to 1MB boundaries
    ORR     r3, r2, r1, LSL#20          // Setup the initial level1 descriptor again
    ORR     r3, r3, #0b0000000000000    // Set CB:00
    ORR     r3, r3, #0b10000000000000    // Set TEX bit 13
    STR     r3, [r0, r1, LSL#2]         // str table entry

//--------MMU TEST-------------------------------------------------------
    LDR     r1,=0x0a000000              // Base physical address of code segment
    LSR     r1, #20                     // Shift right to align to 1MB boundaries
    ORR     r3, r2, r1, LSL#20          // Setup the initial level1 descriptor again
    ORR     r3, r3, #0b0000000010000    // Set XN bit
//    ORR     r3, r3, #0b1000000000000    // tex
    LDR     r1,=0x0c000000              // Base virtual address of code segment
    LSR     r1, #20                     // Shift right to align to 1MB boundaries
    STR     r3, [r0, r1, LSL#2]         // str table entry

//-------------------------------------------------------------------
// Setup domain control register - Enable all domains to client mode
//-------------------------------------------------------------------
    MRC     p15, 0, r0, c3, c0, 0       // Read Domain Access Control Register
    LDR     r0, =0x55555555             // Initialize every domain entry to b01 (client)
    MCR     p15, 0, r0, c3, c0, 0       // Write Domain Access Control Register          
//-------------------------------------------------------------------
// Enable MMU and branch to __main
// Leaving the caches disabled until after scatter loading.
//-------------------------------------------------------------------
    MRC     p15, 0, r0, c1, c0, 0       // Read CP15 System Control register
    BIC     r0, r0, #(0x1 << 12)        // Clear I bit 12 to disable I Cache
    BIC     r0, r0, #(0x1 <<  2)        // Clear C bit  2 to disable D Cache
    BIC     r0, r0, #0x2                // Clear A bit  1 to disable strict alignment fault checking
    ORR     r0, r0, #0x1                // Set M bit 0 to enable MMU before scatter loading
    MCR     p15, 0, r0, c1, c0, 0       // Write CP15 System Control register
C Code for testing MMU

Code: Select all

	test_base = 0x0a000000;
	printf(">before %08x : %08x\r\n",test_base ,*test_base );
	*ttb_base = 0x678; 
	printf(">after %08x : %08x\r\n",test_base ,*test_base );
	
	test_base = 0x0c000000;
	printf(">_before %08x : %08x\r\n",test_base ,*test_base );
	*test_base = 0x666; 
	printf(">_after %08x : %08x\r\n",test_base ,*test_base );
pthead.c mutex

Code: Select all

int pthread_mutex_lock(pthread_mutex_t *mutex){
            /* prepare target address of lock pool */
	__asm__ __volatile__
	(
		"	STMFD    sp!,{r1-r2}" "\r\r\n"
		"	mov r1, #0x001" "\r\n"
		"	mov r2, #0x000" "\r\n"
            /* provide lock address accroding to lock variable */
		"	orr r2, r2, %[value]" "\r\n":: [value]"r" (*mutex)
            /* spin lock implementation with ldrex */
		"0: \r\n"
		"	ldr r1, [r2]" "\r\n"
		"	cmp r1, #0x001" "\r\n"
		"	beq 0b\r\n"
		"	LDMFD    sp!,{r1-r2}" "\r\n"
	);
	return 0;
}

User avatar
Paeryn
Posts: 1573
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Modify pthread library for pi2 bare-metal with LDREX

Fri Jul 28, 2017 1:22 pm

franksai wrote:
Fri Jul 28, 2017 5:26 am
pthead.c mutex

Code: Select all

int pthread_mutex_lock(pthread_mutex_t *mutex){
            /* prepare target address of lock pool */
	__asm__ __volatile__
	(
		"	STMFD    sp!,{r1-r2}" "\r\r\n"
		"	mov r1, #0x001" "\r\n"
		"	mov r2, #0x000" "\r\n"
            /* provide lock address accroding to lock variable */
		"	orr r2, r2, %[value]" "\r\n":: [value]"r" (*mutex)
            /* spin lock implementation with ldrex */
		"0: \r\n"
		"	ldr r1, [r2]" "\r\n"
		"	cmp r1, #0x001" "\r\n"
		"	beq 0b\r\n"
		"	LDMFD    sp!,{r1-r2}" "\r\n"
	);
	return 0;
}
You didn't provide your LDREX example so we can't see how you did it wrong but looking at the pthread_mutex_lock() you provided you aren't doing a lock. The code waits for the mutex to be free but you never claim it once it is so there's nothing stopping another core coming in and taking it from under you.
LDREX doesn't acquire exclusive access on its own, it needs to be paired with STREX which will either succeed in writing the new value to memory and return 0 in the destination register or if that address has been written to since the LDREX (e.g. by another core beating you to it) then the write will not take place and 1 will be returned so you have to go through waiting for it again.

Code: Select all

    MOV r1, #1           ; Lock value, code assumes 0 means free anything else locked
                         ; assume r2 already holds the mutex address
0:
    LDREX r0, [r2]       ; Read current mutex
    CMP r0, #0           ; Is it free
    STREXeq r0, r1, [r2] ; If it was, try setting it
    CMPeq r0, #0         ; And did we succeed in setting it
    Bne 0b               ; Either it wasn't free or we didn't succeed taking it so loop back and try again
She who travels light — forgot something.

dwelch67
Posts: 819
Joined: Sat May 26, 2012 5:32 pm

Re: Modify pthread library for pi2 bare-metal with LDREX

Fri Jul 28, 2017 7:52 pm

not a fan of this new forum software, why wasnt my reply part of the thread?

strex/ldrex are a pair they DO NOT replace swp for use case functionality but at the same time they replace the lack of an atomic operation. For various reasons you dont want an atomic operation on the back end so they broke the operation up into two instructions, hopefully supported by the chip vendor, and what you are looking for is that there was no other access by another core between your operations (implying it was atomic). If you do the second step without the first it will never, ever, return success (if implemented properly), and sometimes with the first step it will still not return success thus the loop, if you look in linux it is or was an infinite loop (causing problems on systems that didnt support the pair, yet another improper understanding and application of arm stuff in linux).

Anyway, you need the pair and you need a loop around them, simply read the arm documentation for the instructions, they are very easy to use.

LdB
Posts: 520
Joined: Wed Dec 07, 2016 2:29 pm

Re: Modify pthread library for pi2 bare-metal with LDREX

Sun Jul 30, 2017 4:58 am

I had to build my own baremetal version of mutex to get the VCOS up for the OpenGL and I just used the linux discussion reference for lock/unlock and left the rest alone
http://linuxkernelarticles.blogspot.com ... cture.html

Some of the discussion went way over my head but the hardware specific code worked.

franksai
Posts: 6
Joined: Thu Jun 08, 2017 11:07 am

Re: Modify pthread library for pi2 bare-metal with LDREX

Thu Aug 24, 2017 9:03 am

thanks for all the example.
Now i have a true mutex_lock but still can't use ldrex.

When ever i use "ldrex" the program crash.
For the setting I use as fellow.
memory attribute set for sharable and normal.
MMU enable.(I am sure 4 core mmu are all working, memory remapping is correct)

start.s

Code: Select all

// -------------------------------------------------------------------
// --------------------Mulicore init----------------------------------
// -------------------------------------------------------------------
    bl _disable_IDcache_MMU_BP
    bl _init_page_table //load TTB_BASE for each core
    bl _enable_SMP
    bl _pmu_init	

    mrc p15, 0, r3, c0, c0, 5 //$r3 get cpu id
    and r3, r3, #0x3
    cmp r3,#0
    bne slave
core0:
    bl _gen_page_table // only core 0 need to generate page table.
    bl _enable_MMU
    //bl _enable_cashe_BP
    bl _cstartup
slave:
    bl _enable_MMU
    //bl _enable_cashe_BP
    bl _asm_mmu_test
    bl idle_thread

_gen_page_table:

Code: Select all

_gen_page_table:
//-------------------------------------------------------------------
// PAGE TABLE generation 
//wei: ref to MPCore mamual 5-3
//-------------------------------------------------------------------
    LDR     r0,=TTB_BASE
    LDR     r1,=0xfff                   // loop counter
    LDR     r2,=    0b00000000000000000000110111100010 //wei: tex-000 BC-00 -> shareable & strongly_order   
init_ttb_1://10_0000 -->fff0_0000
    ORR     r3, r2, r1, LSL#20          // R3 now contains full level1 descriptor to write
    ORR     r3, r3, #0b0000000010000    // Set XN bit
    STR     r3, [r0, r1, LSL#2]         // Str table entry at TTB base + loopcount*4
    SUBS    r1, r1, #1                  // Decrement loop counter
    BPL     init_ttb_1
//0x00000000
    LDR     r1,=0x00000000              // Base physical address of code segment
    LSR     r1, #20                     // Shift right to align to 1MB boundaries
    ORR     r3, r2, r1, LSL#20          // Setup the initial level1 descriptor again
    ORR     r3, r3, #0b0000000001100    // Set CB bits
    ORR     r3, r3, #0b1000000000000    // Set TEX bit 12 //wei: tex-001 BC-11 -> S bit & normal
    STR     r3, [r0, r1, LSL#2]         // str table entry
//0x0700_0000~ 0x0800_0000 
    LDR     r4,=0x080                   // loop region
    LDR     r1,=0x010              	// loop counter 
init_pthread://refrence manual: B5.3.1
    ORR     r3, r2, r4, LSL#20          // R3 now contains full level1 descriptor to write
    ORR     r3, r3, #0b0000000000000    // Set CB:00
    ORR     r3, r3, #0b1000000000000    //set  TEX bit 12  
    ORR     r3, r3, #0b10000000000000000// set S  bit 16  //wei: tex-001 BC-00 -> S bit & normal
    STR     r3, [r0, r4, LSL#2]
    SUBS    r4, r4, #1         		// 
    SUBS    r1, r1, #1                  // Decrement loop counter
    BPL     init_pthread
    //0x3f000000 DEVICE
    LDR     r1,=0x3f000000              // Base physical address of code segment
    LSR     r1, #20                     // Shift right to align to 1MB boundaries
    ORR     r3, r2, r1, LSL#20          // Setup the initial level1 descriptor again
    ORR     r3, r3, #0b0000000000000    // Set CB:00
    ORR     r3, r3, #0b10000000000000    // Set TEX bit 13
    STR     r3, [r0, r1, LSL#2]         // str table entry
//---------------------------------------------------------------
    BX      lr


mutex_lock
(need delay to make test program run successfully)

Code: Select all


		"0: \r\n"
		"	ldr r1, [r2]"   "\r\n"
		
		"	mov r3, #0x100" "\r\n"
                "1: \r\n"
                "	sub r3, r3, #1" "\r\n"
		"	cmp r3, #0"     "\r\n"
		"	bne 1b\r\n"
		//Delay				
		
		"	cmp r1, #0x001" "\r\n"
		"	beq 0b\r\n"
		"	str r1, [r2]"   "\r\n"

User avatar
Paeryn
Posts: 1573
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Modify pthread library for pi2 bare-metal with LDREX

Thu Aug 24, 2017 1:04 pm

franksai wrote:
Thu Aug 24, 2017 9:03 am
mutex_lock
(need delay to make test program run successfully)

Code: Select all


		"0: \r\n"
		"	ldr r1, [r2]"   "\r\n"
		
		"	mov r3, #0x100" "\r\n"
                "1: \r\n"
                "	sub r3, r3, #1" "\r\n"
		"	cmp r3, #0"     "\r\n"
		"	bne 1b\r\n"
		//Delay				
		
		"	cmp r1, #0x001" "\r\n"
		"	beq 0b\r\n"
		"	str r1, [r2]"   "\r\n"
That is not an effective mutex lock (well, not for a multi-tasking / multi-core system), it will easily allow multiple concurrent accesses (and it never actually locks, in fact it waits to be unlocked and then unlocks it).

Say you have two cores both wanting to take the mutex one just slightly before the other and the mutex is currently unlocked

Code: Select all

   Core 0                        Core 1
Read mutex (unlocked)         ... doing something else
Delay loop                    Read mutex (unlocked)
Check value read (unlocked)   Delay loop
Lock mutex                    Check value read (unlocked)  !!! 
Uses resource                 Lock mutex
...                           Uses resource
Can you see that by having a delay between reading the state of the lock and checking it you leave it wide open for another core to lock it whilst you are busy doing nothing then you will have two programs running both thinking they own the lock since it was unlocked at the time both read the memory. Having a delay loop between reading it and checking it makes it even more likely to have two processes thinking they own it.

The last line you gave just stores the value that you initially read back into the lock and since it only gets there when it had read that it is unlocked, you are setting it to unlocked even if somebody else had locked it (and so gives even more opportunity for yet another core to come along and acquire it).

When you say your program crashes when you use LDREX what exactly do you mean?

Reading the docs for LDREX / STREX, you should have a data memory barrier instruction (DMB) after obtaining a lock and before releasing it to make sure memory accesses to the lock and to whatever resource you are protecting have been completed.
She who travels light — forgot something.

franksai
Posts: 6
Joined: Thu Jun 08, 2017 11:07 am

Re: Modify pthread library for pi2 bare-metal with LDREX

Fri Aug 25, 2017 5:24 am

I know my mutex_lock currently isn't for multi-core.
I hope to use mutex-lock that mention in spec
But i don't know how to make the instruction "ldrex" execute.
whenever it comes to "ldrex" the program just stop.
I have tried to enable mmu, but still nor working.

Code: Select all

lock_mutex PROC
    LDR     r1, =locked
1   LDREX   r2, [r0]
    CMP     r2, r1        ; Test if mutex is locked or unlocked
	BEQ     %f2           ; If locked - wait for it to be released, from 2
    STREXNE r2, r1, [r0]  ; Not locked, attempt to lock it
    CMPNE   r2, #1        ; Check if Store-Exclusive failed
    BEQ     %b1           ; Failed - retry from 1
    ; Lock acquired
    DMB                   ; Required before accessing protected resource
    BX      lr

dwelch67
Posts: 819
Joined: Sat May 26, 2012 5:32 pm

Re: Modify pthread library for pi2 bare-metal with LDREX

Mon Aug 28, 2017 9:59 pm

if your ldrex isnt working by itself

Code: Select all

.globl TEST
TEST:
  ldrex r0,[r0]
  bx lr
use it to read the resource, if that is hangning then you have a bigger problem, until you can use an ldrex in place of an ldr successfully, dont bother with the rest...and when ready to bother with the rest look at how linux uses ldrex/strex (not how they incorrectly use it on the wrong targets, but when properly used on the right targets how they use it)

franksai
Posts: 6
Joined: Thu Jun 08, 2017 11:07 am

Re: Modify pthread library for pi2 bare-metal with LDREX

Wed Aug 30, 2017 1:33 am

Thank you for the reply!
I have tried that, but the program just crash.
I insert the test in my boot code.

Return to “Bare metal”

Who is online

Users browsing this forum: No registered users and 9 guests