Bottom Line Up Front:
With respect to CSUD,neither the address passed to the DMA or the local variables on the stack (some of which use the DMA) are affected by L2 cacheing. It seemes only L1 cacheing affects the DMA and the Local variables on the stack.
When data cacheing is enabled, CSUD stops working. This is because problems arise with CSUD’s ability to “talk” to USB devices when data cacheing is used. The caching issues affect both the data buffer passed to the DMA engine and local variables, which are placed on the Stack, that are passed to the DMA engine.
Findings overview (After extensive testing explained below):
The address of the data buffer passed to the DMA engine needs to be in a section of memory in which L1 cache is turned off otherwise, the data sent from the DMA to the data buffer will only update the data in memory but will not be received by the ARM because the ARM will be reading old data that is in the L1 Cache. This causes the ARM to believe nothing has changed on the keyboard and therefore, the ARM will not receive any input from the keyboard and behave as though no one typed on the keyboard.
Solution: One solution is to clean/invalidate the data cache as described by ‘rst’ above. Another solution is to place the data buffer passed to the DMA engine in a section of memory where L1 cacheing is turned off using page table attributes.
L2 cacheing does not seem to affect the DMA. As long as L1 cacheing is 'off' then any alias (0x00000000, 0x40000000, 0x80000000, or 0xC0000000, per BCM2835 pg 5) and any L2 cache attributes set using the TEX bits in the page table (per ARM 1176 Tech man, table 6-3, pg 6-16) will allow the DMA to update the data buffer and the ARM processor will receive the updates.
Stack: (for locally declared variables in CSUD that are passed to the DMA)
Background: CSUD is written in the 'C' language. In 'C', locally declared variables are placed on the stack. in CSUD there are many locally defined variables that are used to initially gather information from the USB devices. If the stack is placed in a section of memory that is L1 "write-back cached, no allocate on Write" (ARM 1176 Tech Man, pg 6-15 & 6-16 refers) then the local variables do not get updated by the USB device therefore CSUD runs in an infinite loop as it keeps trying to get info about the USB devices detected.
L1 Cacheing can be enabled for the memory location that contains the stack but it must be “Write-Through cached, No Allocate on Write” (ARM 1176 Tech Man, pg 6-15 & 6-16 refers).
L2 Cacheing, both the TEX attributes for L2 cacheing and aliases have no effect on the stack for CSUD operation.
Solution 1: Put the stack in a section of memory that is designated as any thing except L1 "write-back cached-no allocate on write".
Solution 2: Find all the local instances in CSUD that are passed to the DMA and put them in a section of memory that is not L1 "write-back cached, no allocate on Write". After spending days trying to decipher the local variables, I gave up, thus this is still solution is still UNSOLVED.
As I now understand it: for simple 1 Mbyte sections, there are 4096 page table entries. Each page table entry corresponds to 1 Mbyte of memory.
The reason there are 4096 entries is because this covers all of the 32-bit address space. To explain: in 32 bit addressing, the address range goes from 0 to 4,294,967,295 (2^32). Each Mbyte contains 1,048,576 bytes. Therefore, there are 4096 1-Mbyte sections (1,048,576 * 4096 = 4,294,967,296).
When the ARM processor wants to access a memory address, it sends that address to the MMU which looks at the page table entry that corresponds to the Mbyte section of the address passed from the ARM to the MMU. For example: ARM passes the address 0x8000 to the MMU, this is located within the first Mbyte of memory, therefore, the MMU will look at the first page table entry. If the ARM passed the address: 0x108000 to the MMU, this will be in the second Mbyte of memory, therefore, the MMU will look at the second page table entry. So forth and so on.
Typically when we set up the page table entries to allow data caching, we set all memory below 512 Mbytes to cacheable and all entries above 512 Mbytes are set to “device, shared” which means caching is turned off. This is key to understanding the ‘4’ alias confusion.
The reason the addition of 0x40000000 to the data buffer’s address seemed to be the solution was not due to the ‘4’ alias, but rather because when the address was sent from the ARM to the MMU, the MMU looked at the page corresponding to the 0x400 Mbyte section. In other words, lets say the address of the data buffer passed to the DMA engine was initially 0x1234. This is in the first Mbyte of memory, the MMU looks at the first “page” in the page table and sees that cacheing is enabled in this section, thus CSUD will not work. So the initial thinking, based on BCM2835 manual, was that there was an issue with L2 cacheing and the DMA, so a ‘4’ alias was appended to the data buffer address, resulting in: 0x40001234. Doing this allowed CSUD to work. My false reasoning was that L1 cache was still enabled because 0x40001234 accesses the exact same memory location as 0x1234, therefore, the issue must be in L2 cache, which was solved by the ‘4’ alias. However, this reasoning is wrong. When the ARM passed 0x40001234 to the MMU, the MMU no longer looked at the first page table entry, it looked at the 1024-th page table entry (0x400 = 1024). The memory attributes for page table entry 1024 were inner and outer caching off, device, shared. Thus the ‘4’ alias not only affected L2 per BCM2835 manual pg 5, but also turned off cacheing due to the way I set up the page table.
First an assumption: L1 is “inner” and L2 is “outer”.
The MMU was set up with flat mapping, in other words ARM addresses mapped to the same physical address, ie ARM address 0x1234 mapped to physical address 0x1234.
The confusing part of all this is that there are multiple ways to affect L1 and L2 cacheing which are not always readily apparent. As described above, I though appending the ‘4’ alias was only affecting L2 cacheing, but in fact L1 cache was also affected (albeit in a round about manner).
To test what actually affects what, I place the data buffer for the DMA at memory location 0x300000, which is in the 4th Mbyte of memory. This corresponds to the 4th page table entry which affects memory addresses 0x300000 through 0x3FFFFF. To separate controls for L1 and L2 cacheing, I set TEX  bit to ‘1’. This causes TEX bits [13:12] and the C,B bits [3:2] to behave according to table 6-3 in the ARM 1176 tech man, pg 6-16, allowing independent control of L1 and L2 caches. (with TEX set to ‘0’, the bits C & B control both L1 and L2 cache)
Then, with L1 set to strongly ordered memory, bits [3:2] = 0b00, I tested every combination of L2 cacheing using TEX bits [13:12]. Next I changed L1 to “device, shared”, bits [3:2] = 0b01 and again tested every combination of L2 cacheing. CSUD worked correctly in every one of these scenarios. (note: the alias in these cases was ‘0’)
Next I added the ‘4’ alias by remapping in the page table. Then ‘8’ and ‘C’, none of these alias had any effect and CSUD worked.
I repeated all of the above tests with L1 cache set to “Write-Through cached”, bits [3:2] = 0b10, and again with L1 cache set to “Write-Back cached”, bits [3:2] = 0b11. CSUD failed in all cases. No combination of alias’s (0,4,8 or C) or L2 cacheing, including off, allowed CSUD to function when either write-back or write-through L1 cache was turned on.
To further verify my thoughts on how the MMU works, I added the ‘4’ alias to the address of the data buffer and changed the L1 and L2 attributes for page table entry 1024. With L1 caching off, L2 attributes had no effect and CSUD worked fine. But when any type of L1 cacheing was turned on, CSUD stopped working even though it had the ‘4’ alias.
I did all of the above tests with the stack as well. I placed the stack in the middle of a different 1-Mbyte section (to be exact, I placed the stack in the middle of the 1-Mbyte section that extends from 0x200000 to 0x2FFFFF. The address 0x280000 was used for the stack. This is 1/2 way between 0x200000 and 0x2FFFFF) to ensure that moving downward through the stack would not cause it to address the page table entry for the next lower Mbyte section. I got the same results as with the data buffer with one exception. CSUD still worked with L1 cache set to “write-through cached” but not with “write-back cached”. So it seems the stack (local variables for CSUD) can be in a L1 cached section of memory as long as write-through is set as the cache policy. No combination of L2 cacheing or aliases on the stack address had any effect on the operation of CSUD, only L1 affected it.
The address passed to the DMA only has to have L1 cacheing turned off and is not affected by L2 cacheing.
If you have any comments, either cofirming or disagreeing with my findings please let me know. This is a complicated subject and I probably overlooked some nuance. I also hope this helps anyone trying to use CSUD with data cacheing turned on, or any application of L2 cache control.