bzt
Posts: 177
Joined: Sat Oct 14, 2017 9:57 pm

Clang question (unwanted memset and memcpy calls in Assembly)

Sat Aug 04, 2018 10:07 am

Hi,

I've started to play with LLVM clang. I suddenly run into a problem, which seems strange.

When I use -O0, clang generates calls to memset and memcpy, despite the "-ffreestanding" and "-fno-builtin" flags.
When -O1 used, there are no such calls, but the generated Assembly is just wrong (contains "brk" instruction). Out of curiousity, I've checked it with gcc too. Interestingly -O0 works (gcc 7.2, 7.3, 8.1), and -O1 works for 7.2, 7.3 but not with gcc 8.1, which generates the same bad Assembly code (and not only for AArch64, the x86_64 output is bad too, there it emmits "ud" instruction). Using -O2 breaks with gcc 7.2 and 7.3 too.

Anybody has a workaround here? I think this is clearly a C compiler bug, because:
1. -ffreestanding clearly tells the compiler there's no libc, so it should not rely on memset and memcpy library functions
2. -fno-builtin clearly tells the compiler not to use builtins, like llvm.memset or llvm.memcpy intristics
3. -O0 clearly tells the compiler to compile as-is, do not use any optimisations
4. Optimizer generates bad Assembly with -O1 and above, that's a bug, no doubt. Under no circumstances should a simple ANSI C source compiled into a "brk" (unconditional machine exception) instruction

After googling half a day, I was unable to find a solution, but maybe someone here with more experience on clang and bare metal knows a workaround? Maybe there is a magic flag like "--do-exactly-as-i-say-i-know-what-i-am-doing" that I missed? Or do I need to add "volatile" to my every single variable? I read on one forum that it might help to avoid memset and memcpy.

Or is there any other free compiler alternatives that support both AArch64 and x86_64 targets? I've checked a fork of tcc, still not good enough unfortunately :-(

(Just a sidenote, I've found plenty of complaints about the opposite: removal of a deliberate memset call by the optimizer, which poses a security threat, as for example password is not cleared from memory.)

Cheers,
bzt

User avatar
Paeryn
Posts: 2169
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Sun Aug 05, 2018 1:21 am

-ffreestanding doesn't necessarily preclude requiring you to provide certain functions. I'm not sure on clang's handling but gcc expects that memcpy and memset will be provided by you. I wouldn't be surprised if clang has the same reliance.
GCC requires the freestanding environment provide memcpy, memmove, memset and memcmp.
-no-builtin tells the compiler to not assume knowledge of key library functions. Otherwise the compiler can assume that for example memcpy() does exactly what it is supposed to and could be replaced with an inlined equivalent version even though it never sees the code for the function.
She who travels light — forgot something.

LdB
Posts: 872
Joined: Wed Dec 07, 2016 2:29 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Sun Aug 05, 2018 4:30 am

The standard retargetting method is to add -specs=nosys.specs to the gcc linker command line. That allows you to link in a separate library with implementations for all required system functions. Generally you don't want it not to call the standard library calls you simply want to provide your own versions of those standard calls at link time (in your case CLANG/LLVM should do it). The calls are being linked because the code you have compiled requires them (the issue others had with them missing).

I would expect the CLANG/LLVM to provide it's own version of those system calls.

Now on the other issue in theory the difference between -O0,-O1,-O2 is simply that the optimizer is allowed to inline small functions under some of the optimizations and has done so. The code is supposed to still work :-)

You should be able to stop it in lining small functions in the higher optimizations with -fno-inline-small-functions switch if it is really bugging.
However with the -specs=nosys.specs it won't have functions to inline (it only knows about them at linker phase) and the problem should go away anyhow (it is in lining the GCC library calls hence the bug).

Anyhow the keyword search you can use "gcc retargetting" there are a number of good web tutorials on it including Redhat and DR Dobbs.
You can also override library functions easily in GCC if required which follows a similar theme using LD_PRELOAD to load the new versions you wish to use.

bzt
Posts: 177
Joined: Sat Oct 14, 2017 9:57 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Mon Aug 06, 2018 9:57 am

Thank you for your replies!
Otherwise the compiler can assume that for example memcpy() does exactly what it is supposed to and could be replaced with an inlined equivalent version even though it never sees the code for the function.
@Paeryn: exactly, that's why I'd expect no memset or memcpy optimisations with "-ffreestanding -fno-builtin -O0", specially not a call to them. Btw, I'm sure my code is optimized enough, and I'm afraid introducing a memset or memcpy call means changed semantics. I don't care about the speed here, since my code runs only once on boot, therefore correctness is more important than speed.
Generally you don't want it not to call the standard library calls you simply want to provide your own versions of those standard calls
@LdB: yeah, but the thing is, I'm trying to compile my libc for bare metal... :-) Just to make it clear, my code does not have any implicit library calls, they're put there by the compiler. With gcc, if I use "-ffreestanding -O0" the code is compiled properly (no other arguments are needed), and the generated code runs perfectly.

FYI, I've filed a bug report to clang. Let's see what happens. I've also linked other pages because I'm not the only one who want to avoid mem* calls in specific cases.

Cheers,
bzt

bzt
Posts: 177
Joined: Sat Oct 14, 2017 9:57 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Mon Aug 06, 2018 10:07 am

bzt wrote:
Mon Aug 06, 2018 9:57 am
Thank you for your replies!
Otherwise the compiler can assume that for example memcpy() does exactly what it is supposed to and could be replaced with an inlined equivalent version even though it never sees the code for the function.
@Paeryn: exactly, that's why I'd expect no memset or memcpy optimisations with "-ffreestanding -fno-builtin -O0", specially not a call to them. Btw, I'm sure my code is optimized enough, and I'm afraid introducing a memset or memcpy call means changed semantics. I don't care about the speed here, since my code runs only once on boot, therefore correctness is more important than speed.
Generally you don't want it not to call the standard library calls you simply want to provide your own versions of those standard calls
@LdB: yeah, but the thing is, I'm trying to compile my libc for bare metal... :-) Just to make it clear, my code does not have any implicit library calls, they're put there by the compiler. With gcc, if I use "-ffreestanding -O0" the code is compiled properly (no other arguments are needed), and the generated code runs perfectly.

FYI, I've filed a bug report to clang. Let's see what happens. I've also linked other pages because I'm not the only one who want to avoid mem* calls in specific cases.

ps.: to be precise, the miscompiled code is in the initialization of the memory manager. There I've an array of offset+size+type (16 bytes) items describing the memory map (provided by the boot loader). The meminit code iterates on this list, checks each free area, excludes certain memory areas from them (like the code itself and the initrd), and adds the remaining offset+size pair to the free memory list if size is bigger than zero. I'm pretty sure this algortihm does not need any memcpy or memset calls, and having them is certainly worse than having a single iteration loop. (That's why I'm also worried about changed semantics).

Cheers,
bzt

User avatar
Paeryn
Posts: 2169
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Mon Aug 06, 2018 10:50 am

bzt wrote:
Mon Aug 06, 2018 9:57 am
Thank you for your replies!
Otherwise the compiler can assume that for example memcpy() does exactly what it is supposed to and could be replaced with an inlined equivalent version even though it never sees the code for the function.
@Paeryn: exactly, that's why I'd expect no memset or memcpy optimisations with "-ffreestanding -fno-builtin -O0", specially not a call to them. Btw, I'm sure my code is optimized enough, and I'm afraid introducing a memset or memcpy call means changed semantics. I don't care about the speed here, since my code runs only once on boot, therefore correctness is more important than speed.
gcc (and I assume clang) make use of the mem*() function calls for internal things like copying a struct (especially if the struct is large). Gcc's docs point that out (that's where the quote I gave came from), in freestanding you need to provide those mem*() functions incase they are called by the generated code.
She who travels light — forgot something.

dwelch67
Posts: 944
Joined: Sat May 26, 2012 5:32 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Mon Aug 06, 2018 6:23 pm

been there, dealt with this, I think it is a longer obscure command line option, will see if I can find it, but yes it is annoying that they do that...I stopped using clang in the 3.x days because every release the command line options changed and it became unmanagable to provide examples or maintain makefiles for myself.

dwelch67
Posts: 944
Joined: Sat May 26, 2012 5:32 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Mon Aug 06, 2018 6:29 pm

try
-disable-simplify-libcalls
still looking


dwelch67
Posts: 944
Joined: Sat May 26, 2012 5:32 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Mon Aug 06, 2018 7:20 pm

note my llvm/clang preference was to build to bitcode/bytecode whatever they call it (--emit-llvm) unoptimized on the compiler then combine all the code into one file then run the optimizer on that one file then use llc to make the target specific asm or object file since it can do that now. no doubt clang to object follows a similar path but for one file/object so when you find the right command line option there should be a way to just use it or have it passed to llc

LdB
Posts: 872
Joined: Wed Dec 07, 2016 2:29 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Tue Aug 07, 2018 3:20 am

Paeryn wrote:
Mon Aug 06, 2018 10:50 am
gcc (and I assume clang) make use of the mem*() function calls for internal things like copying a struct (especially if the struct is large). Gcc's docs point that out (that's where the quote I gave came from), in freestanding you need to provide those mem*() functions incase they are called by the generated code.
You are spot on correct, in fact any use of printf, sprintf etc for example will do it because it mallocs an internal buffer it uses to build up the output for the variadic and memcpys them out.

David's switch, link and answer is pretty much what I expected but I agree with him you would wish they would make it more simple.

Anyhow that should hopefully get bzt on the path to a solution.

dwelch67
Posts: 944
Joined: Sat May 26, 2012 5:32 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Wed Aug 08, 2018 2:16 am

Can you please post some example code that causes the problem(s) with these tools. Note gcc 8.2.0 is out, curious to see this gcc 8.x.x issue (as well as the clang/llvm).

David

bzt
Posts: 177
Joined: Sat Oct 14, 2017 9:57 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Wed Aug 08, 2018 1:10 pm

Hi All,

Thank you all for the suggestions! After consulting with the clang developers, and spent some annoying hours on debugging, I have partially figured it out. Here's what I've learned:

My memory entry struct is not big, it's only 16 bytes (handled and referenced in the code as two uint64_ts, memory area start and memory area size). It is true that certain constructs in C cannot be compiled without memcpy, but imho this does not imply that everything must be solved with memcpy. But clang thinks so, that's the reason for memcpy calls with -O0. It is arguable if always using memcpy is right, at least gcc does not use memcpy so it's proven it can be done. The problem with clang is that there's no flag to control whether to avoid loop optimisations or at least inline that unwanted memcpy or not. Inlining is implied by -O1, which also enables other optimisations that breaks the code elsewhere. This ticket is still open, waiting for me to create a PoC not using any bare metal code. (FYI, I have found out that gcc has a flag to disable loop optimisations when -O1 used (it's not needed with -O0). Not perfect, but better than nothing, and that way there's no need to force inlining memcpy).

Now about "ud" and "brk" instruction with -O1: it turned out, that the compiler thinks it is smarter than the programmer, and of course it is not... The issue is quite complex, but in a nutshell: I've mapped a supervisor-only page at 0, so that any user code dereferencing a null pointer would cause a data abort / page exception which can be catched by the supervisor in run-time. This works fine, just as expected. But, since it's a supervisor-only page anyway, I've decided to store information on the currently mapped address space in that page, accessible only for the supervisor. With gcc -O0 this is working great. But with -O1 (both gcc and clang), the compilers incorrectly think that there's a null pointer involved, and therefore they generate a __builtin_trap() call, which is compiled as "ud" or "brk". To make this clearer, for example I store the pid of the currently running process at offset 64. When a taskswitch is made, I map this info at the first page along with the rest of the address space. Now I would be crazy to reference the pid directly by it's virtual address from C, so I use a struct (obviously). Although the memory address of the pid field is 64, the compiler thinks it's a null pointer reference, something like "((procstruct*)0)->pid", therefore instead of a "mov" or "ldr" instruction, it generates "ud"/"brk"...

To summarize it up: the problem lies in the fact that not all aspects of the language are defined by the C standard. Uncovered topics are called "undefined behaviour", up to the compiler how to solve them. This shouldn't be a problem if you could turn the optimizer off, but since you can't, crazy things could happen, including changing the semantics. For example, one would assume that the following code checks for a null reference:

Code: Select all

void faulty(int *p)
{
  int *dead=*p;
  if (p==NULL)
    return;
  *p=123;
}
But it's not. It's because although the "int *dead=*p" is removed (being an unused variable), the compiler may incorrectly assume that p is already dereferenced once, therefore p is not NULL, so it will optimize away the "if(p==NULL)" check silently, ultimately compiling the following with changed semantics:

Code: Select all

void faulty(int *p)
{
  *p=123;
}
As far as I can see, I have the following solutions:
1. -O1: waste the first page, and map the address space struct at the second (not gonna happen)
2. -O0: duplicate mem* library functions
3. -O0: avoid using clang

Cheers,
bzt

User avatar
Paeryn
Posts: 2169
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Wed Aug 08, 2018 5:10 pm

bzt wrote:
Wed Aug 08, 2018 1:10 pm
Now about "ud" and "brk" instruction with -O1: it turned out, that the compiler thinks it is smarter than the programmer, and of course it is not... The issue is quite complex, but in a nutshell: I've mapped a supervisor-only page at 0, so that any user code dereferencing a null pointer would cause a data abort / page exception which can be catched by the supervisor in run-time. This works fine, just as expected. But, since it's a supervisor-only page anyway, I've decided to store information on the currently mapped address space in that page, accessible only for the supervisor. With gcc -O0 this is working great. But with -O1 (both gcc and clang), the compilers incorrectly think that there's a null pointer involved, and therefore they generate a __builtin_trap() call, which is compiled as "ud" or "brk". To make this clearer, for example I store the pid of the currently running process at offset 64. When a taskswitch is made, I map this info at the first page along with the rest of the address space. Now I would be crazy to reference the pid directly by it's virtual address from C, so I use a struct (obviously). Although the memory address of the pid field is 64, the compiler thinks it's a null pointer reference, something like "((procstruct*)0)->pid", therefore instead of a "mov" or "ldr" instruction, it generates "ud"/"brk"...
With no optimisation the compiler hasn't realised you were attempting to start from 0 and if you never explictly try dereferencing the first element then it works by chance. With the optimiser on, if the compiler can guarantee that the base pointer is NULL then it is well in its right to automatically trap any access through that pointer even if you never access NULL itself as no object can exist at NULL. 64 bytes after nowhere is still nowhere.
She who travels light — forgot something.

bzt
Posts: 177
Joined: Sat Oct 14, 2017 9:57 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Thu Aug 09, 2018 12:33 pm

Paeryn wrote:
Wed Aug 08, 2018 5:10 pm
no object can exist at NULL
That's simply not true. Address 0 in memory is just as any other. Even more, many architectures store hardware specific objects at that address. For example before vbar was introducted, ARM used it for exception vectors. Same with x86 in real mode and with Freescale processors.
If it's not used by the hardware, I can store a procstruct at address 0, it's perfectly valid and all processors can correctly interpret that. You can encode such instructions in Assembly without any issues. The fact that the C language adds additional context to it is indifferent and is just a language specific thing. FYI, I've also learned that with addressspace attribute you can turn off __builtin_trap generation in clang, and gcc supports that too with a command line argument.
64 bytes after nowhere is still nowhere.
Think about it. Given that 64 was just an arbitrary address, that would mean that ANY memory address were nowhere, therefore no memory can be accessed.

Cheers,
bzt

User avatar
Paeryn
Posts: 2169
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Thu Aug 09, 2018 4:17 pm

bzt wrote:
Thu Aug 09, 2018 12:33 pm
Paeryn wrote:
Wed Aug 08, 2018 5:10 pm
no object can exist at NULL
That's simply not true. Address 0 in memory is just as any other.
Of course address 0 exists, but setting a pointer to 0 is telling the compiler that the pointer is pointing to an invalid address, if that pointer is of type pointer to struct then it makes sense that the compiler can assume that the whole struct is invalid.
She who travels light — forgot something.

LdB
Posts: 872
Joined: Wed Dec 07, 2016 2:29 pm

Re: Clang question (unwanted memset and memcpy calls in Assembly)

Thu Aug 09, 2018 5:15 pm

https://en.wikipedia.org/wiki/Null_pointer
The C standard does not say that the null pointer is the same as the pointer to memory address 0, though that may be the case in practice.
You are assuming something that C does not guarantee (although often implemented a such) and so Paeryn is correct.
It means a value reserved for indicating that the pointer does not refer to a valid object and nothing more.

It's possible to have a C compiler declare NULL as say address 0xDEADBEEF. All it has to do is make a pointer compare code that returns true (zero) when compared to zero which is dead simple just XOR the pointer with constant 0xDEADBEEF and compare for zero. If you know your XOR logic you will get that it works. Obviously that is a lot of messing around with pointer maths and most compilers don't but nor can you assume it doesn't happen.

Return to “Bare metal, Assembly language”