Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: data memory barriers (technical)

Sat Nov 19, 2011 9:00 pm

Hi. Question for the techie crew. Implementing some stuff, need a dmb. Unfortunately, ARMv6 doesn't implement the 'dmb' operand...

So, it seems that the way of doing it on ARMv6 is via coprocessor 15, as follows (gcc + inline assembler syntax)
uint32_t dest = 0;
__asm__ __volatile__("mcr p15,0,%0,c7,c10,5" :"=&r"(dest) : : "memory");
Which all looks fine. But I've also found it done like this (for uniprocessor systems only) :
__asm__ __volatile__("" : : : "memory");
which kinda makes sense as well. So I have to ask : Do we consider ourselves a uniprocessor or multiprocessor system, given that the GPU can get its sticky little fingers on our memory?


Raspberry Pi Engineer & Forum Moderator
Raspberry Pi Engineer & Forum Moderator
Posts: 28358
Joined: Sat Jul 30, 2011 7:41 pm

Re: data memory barriers (technical)

Sun Nov 20, 2011 6:05 pm

Regard it as single core. The GPU assigns memory to the Arm at startup, and doesn't touch it after that point (unless specifically told to). Interestingly the Arm uses the standard memory management, pages etc, but the GPU looks on it as a linear address space,which makes moving data back and forth a bit awkward, as I'm finding out!
Principal Software Engineer at Raspberry Pi (Trading) Ltd.
Contrary to popular belief, humorous signatures are allowed.
I've been saying "Mucho" to my Spanish friend a lot more lately. It means a lot to him.

Posts: 1456
Joined: Sun Sep 11, 2011 2:32 pm

Re: data memory barriers (technical)

Mon Nov 21, 2011 8:37 am

Thanks James.

Posts: 31
Joined: Thu Nov 10, 2011 7:11 pm

Re: data memory barriers (technical)

Wed Nov 23, 2011 10:18 pm

The former is a memory barrier at the CPU level, the latter is a compiler memory barrier. It tells gcc not to cache anything in registers or temporaries across the boundary. We use it extensively in Linux and it's also implied by the various locking operators. The basic problem is that the C language "volatile" keyword is useless for most activities.

In the Linux world we thus have

abstracted implementations for "true" CPU barriers (which degrade to barrier() on cache coherent devices)

rmb() read barrier
mb() read/write barrier
wmb() write barrier

and compiler barrier

barrier() - compile barrier

Plus smp specific ones for special cases (smp_wb etc)

It's usually more fun than this because for I/O devices you get posting on many of the busses (so a write leaves the CPU, the CPU continues doing stuff, *then* it hits the device). This leads to vast amounts of fun with DMA because

cpu->device STOP DMA
cpu: free memory

device.. oh look a STOP DMA request, guess I should stop
[data drains into freed and reused memory]

The world is a good deal more complicated than Z80 and 6502 these days. Busses are often asynchronous with different timing rules and even interrupt delivery can be asynchronous (so you can disable an IRQ and then get one because it was 'in flight'). There are usually rules along the lines of 'write the STOP, read the status' to be sure.

Re; the memory mapping Jamesh - this is why x86 systems have a GART or GTT !

Return to “General discussion”