m.daae
Posts: 22
Joined: Thu Apr 02, 2015 4:59 pm

Cache Page Size

Fri Jun 26, 2015 6:38 pm

What is the actual size of a single cached page of memory in the RasPi 2B? In the PC world it usually works out to 4K but I'm really new to the linux/RasPi world.

I'd like to minimise the number of dirty memory cache hits by spreading out the memory map such that for the most part each mpu core does not dirty up another core's "primary" memory space (intuitively this would be its L1 cache contents). i would like to use the finest granularity that "works" but I suppose I could just use one fourth of the L2 cache size (512K IIRC) and not worry about it.

It would be nice to know definitively though.

-m.daae

User avatar
jojopi
Posts: 3064
Joined: Tue Oct 11, 2011 8:38 pm

Re: Cache Page Size

Fri Jun 26, 2015 7:34 pm

You seem to be confusing the page size in the memory management unit with the cache line size in the various levels of cache. There is absolutely no relation between the sizes, except that you can assume that each will be a power of two with natural alignment, so that a page always consists of a power of two number of cache lines.

The page size is used for memory access protection, swapping, and memory mapping of files. You can get it fairly portably at run time using sysconf(3): http://man7.org/linux/man-pages/man3/sysconf.3.html

The cache system is used to improve performance by reducing accesses to main memory. On some Linux platforms, you can get the line sizes using /sys, but this does not appear to be implemented on the Pi: http://stackoverflow.com/questions/7946 ... -line-size

On x86, the smallest page size is 4K, while the biggest cache line size is normally 64 bytes. I think that both ARMv6 and ARMv7 are 4K and 32 bytes.

Because of cache associativity, spreading data widely apart on page boundaries or at fractions of the total cache size may force them into addresses that must share the same subset of the cache slots.

I am not sure whether your concern about cores dirtying each others memory is valid or not. Perhaps you could try some benchmarks at various layouts. If you can observe a difference you will also find the best layout to minimize it at the same time. Preferably keep the sizes configurable so you can test again in the completed application.

User avatar
Paeryn
Posts: 2516
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Cache Page Size

Fri Jun 26, 2015 9:50 pm

jojopi wrote:I think that both ARMv6 and ARMv7 are 4K and 32 bytes.
Both Pis have an I-Cache line size of 32 bytes. The D-Cache line size is different, on the Pi it is also 32 bytes, but on the Pi2 it's 64 bytes.
She who travels light — forgot something.

m.daae
Posts: 22
Joined: Thu Apr 02, 2015 4:59 pm

Re: Cache Page Size

Fri Jun 26, 2015 11:19 pm

Just to be clear ...

As I understand it there is a memory hierarchy that looks something like this on the RasPI 2B :

1GB System Dynamic RAM
|
512KB L2 Cache
| | | |
| | | Core 0
| | | | |
| | | | 32KB L1 Data Cache
| | | | |
| | | | 64B D-Cache line
| | | |
| | | 32KB L1 Instruction Cache
| | | |
| | | 32B I-Cache line
| | |
| | Core 1
| | | |
etc ...

I further understand that as long as memory is read "only", things happen largely at the nominal system speed... But when data is written back out to memory then something must eventually happen to update the larger capacity storage hardware devices "above" to effectively make this change available for the other processors.

I want to create a set of 3 cascaded producer/consumer threads that can run more-or-less independently on 3 cores such that things only have to slow down when a thread/core produces a message for consumption by the next thread/core. so ...

It appears that I should have simply asked "What is the MMU page size?" and the answer appears to be "4KB"

cool !

-m.daae

User avatar
Paeryn
Posts: 2516
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Cache Page Size

Sat Jun 27, 2015 12:53 am

m.daae wrote:Just to be clear ...

As I understand it there is a memory hierarchy that looks something like this on the RasPI 2B :

1GB System Dynamic RAM
|
512KB L2 Cache
| | | |
| | | Core 0
| | | | |
| | | | 32KB L1 Data Cache
| | | | |
| | | | 64B D-Cache line
| | | |
| | | 32KB L1 Instruction Cache
| | | |
| | | 32B I-Cache line
| | |
| | Core 1
| | | |
etc ...
Something like that I think.
m.daae wrote:I further understand that as long as memory is read "only", things happen largely at the nominal system speed... But when data is written back out to memory then something must eventually happen to update the larger capacity storage hardware devices "above" to effectively make this change available for the other processors.

I want to create a set of 3 cascaded producer/consumer threads that can run more-or-less independently on 3 cores such that things only have to slow down when a thread/core produces a message for consumption by the next thread/core. so ...

It appears that I should have simply asked "What is the MMU page size?" and the answer appears to be "4KB"

cool !

-m.daae
As I understand it, the A7's Snoop Control Unit can detect when multiple core's cache lines are caching the same addresses and automatically synchronise them when needed without having to write back to memory. This will obviously be slower than not hitting each other's caches as the cores will prioritise the SCU accesses, but means that data doesn't have to be transferred up to L2 or external memory just because two or more cores are accessing the same memory location.
She who travels light — forgot something.

m.daae
Posts: 22
Joined: Thu Apr 02, 2015 4:59 pm

Re: Cache Page Size

Mon Jun 29, 2015 5:44 pm

Paeryn wrote:As I understand it, the A7's Snoop Control Unit can detect when multiple core's cache lines are caching the same addresses and automatically synchronise them when needed without having to write back to memory. This will obviously be slower than not hitting each other's caches as the cores will prioritise the SCU accesses, but means that data doesn't have to be transferred up to L2 or external memory just because two or more cores are accessing the same memory location.
Again, thank you ! I do appreciate your insights, Paeryn.

-daae

Return to “C/C++”