swampdog
Posts: 718
Joined: Fri Dec 04, 2015 11:22 am

Re: Schrödinger's Code - Undefined behavior in theory and practice

Tue Jun 08, 2021 6:19 pm

jahboater wrote:
Tue Jun 08, 2021 12:46 pm
swampdog wrote:
Tue Jun 08, 2021 10:54 am
What's the point in checking every tiny memory allocation?
Its good practice, even though (on Linux) the OOM (Out Of Memory) killer will likely terminate your process first.

This thread is about much more than that.

Did you read Heater's link?
https://queue.acm.org/detail.cfm?id=3468263
It is well written and a good read, though I didn't see anything unexpected.
Yes. I mentioned so earlier.

In retrospect I didn't make myself clear enough. If all someone has ever done is write GUI apps (particularly on windoze) then the entire paper Heater linked to, would be meaningless because those programmers are already missing much bigger things than if something gets optimised out.

I am tired but for the life of me can't get a warning out of gcc for Fig2 (sumdiv).

User avatar
jahboater
Posts: 7158
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Schrödinger's Code - Undefined behavior in theory and practice

Tue Jun 08, 2021 7:27 pm

swampdog wrote:
Tue Jun 08, 2021 6:19 pm
I am tired but for the life of me can't get a warning out of gcc for Fig2 (sumdiv).
Ditto for GCC 11.1. Looks like a compiler bug to me.
It knows perfectly well that "s" is uninitialized and elides references to it.
-Wuninitialized should complain.
Hmm ...

User avatar
Paeryn
Posts: 3305
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 1:12 am

jahboater wrote:
Tue Jun 08, 2021 7:27 pm
swampdog wrote:
Tue Jun 08, 2021 6:19 pm
I am tired but for the life of me can't get a warning out of gcc for Fig2 (sumdiv).
Ditto for GCC 11.1. Looks like a compiler bug to me.
It knows perfectly well that "s" is uninitialized and elides references to it.
-Wuninitialized should complain.
Hmm ...
It's a long standing issue that isn't likely to be fixed any time before the end of the universe. From what I remember it's down to how the optimisation passes interact, it can't report it early because there would be way too many false positives and by the time it's gone through various passes the dead code elimination pass removes the part that knew it was uninitialised.

You can get it to warn you when incrementing if you try printing s straight after incrementing it, though if you instead try printing s it just before returning it then you do get the warning but only on the attempt to print it and not on the increment.

I think it worked ok back in gcc-2 (and probably gcc-3) but that was when the optimisers weren't so complex.
She who travels light — forgot something.
Please note that my name doesn't start with the @ character so can people please stop writing it as if it does!

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 8:26 am

Wow, a lot of responses all of a sudden. Let me try to catch up...
jamesh wrote:
Tue Jun 08, 2021 9:25 am
I know a number of MS programmer that I would not employ, yet have managed to avoid the MS cull for at least 15 years.
So I would not hold them up as paragons of programming virtue.
I have no doubt that is true. As is of many places.

I think we can make the reasonable assumption that the the talent pool at MS is representative of that of the rest of the programming community. Except they have the advantage that MS has buckets of money to use on trying to ensure software robustness. Which they admit has not been doing very well, 70% CVE's traced back to memory use errors. Most other outfits don't have the resources of MS.
jamesh wrote:
Tue Jun 08, 2021 9:26 am
Any compiler from the last 10 years at least should have -Wall or similar. Your experience prior to that is irrelevant. Technology has improved.
Yes, compiler warnings now a days are indeed much better.

However to this day "-Wall" does not enable many warnings one would like. You need a bunch of other W's for that. Even then many memory misuse errors will not give any warning. Some warning are not available with optimised release builds.

I think my experience of working with safety-critical and secure software all those years ago is still relevant thank you very much.

You are right, technology has improved. We now have safer languages comming along. Perhaps we should use them...
jahboater wrote:
Tue Jun 08, 2021 9:28 am
Programs written in such languages may still fail, but they are likely to fail gracefully.
Indeed. One will always be able to write buggy code. However using languages that don't have UB would be a great start in reducing failures.
jahboater wrote:
Tue Jun 08, 2021 9:28 am
Regardless of the programming language, I think its worth getting into the mindset of proving to yourself beyond all doubt that UB or other failure conditions cannot occur.
I agree. The programmers duty is to create code that works and is robust. One should take pride in that. Unfortunately what you suggest is impossible. At least in as far as tackling all the UB we have. Once a program gets to big it's too much to keep in mind. Worse still when one is part of huge team, when one does not know who will be tweaking the code next etc, etc,
jahboater wrote:
Tue Jun 08, 2021 9:28 am
If the code is too complex, then add explicit checks (thus enabling the proof). Sanitize unpredictable input.
Yes.

How about we build those checks into the very syntax and semantics of the language and have the compiler do it for us? That would avoid all the error prone busy work of adding those checks. Most programmers are into automating everything, why not that tedious bookkeeping? It would also give us compilation errors rather than program failure at run time as a bonus.
jahboater wrote:
Tue Jun 08, 2021 9:44 am
2) I think this is compiler technology dependent. A by-product of the heavy optimization that C compilers do is the advanced data-flow analysis which can provide more accurate diagnostics.
Compilers check all sorts of things. I see that GCC now does a strict analysis of the worst case buffer size needed for a sprintf, even with a complex format.
In general, as long as the warnings are switched on, I think C does a good job of static checking. The simple syntax may help.
Yes it is compiler technology dependent. And modern compilers are very much better than they were.

C does a terrible job of static checking, mostly because of it's simple syntax and semantics (Which I love by the way). To do an order of magnitude or two better a language with a more analysable syntax and semantics is required. At least that is what I conclude given that people have been working on tis problem for decades and not done very well so far.
jahboater wrote:
Tue Jun 08, 2021 9:44 am
3) I don't believe this is cost free:

n = array[ atoi(argv[1]) ];

this expression simply must have two or more explicit checks, involving actual code and run-time overhead.
(not that that's a bad thing).
Yes, code like that certainly needs checks in place. Being lazy I'd rather the compiler put those checks in place. Or at least strongly twisted my arm to ensure that I did not forget to.

In general though I think you are hinting at the run-time overhead of array bounds checking. This is simply not an issue with modern compiler technology. A lot of that bounds checking can be inferred at compile time.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 7501
Joined: Tue Mar 18, 2014 11:47 am

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 2:18 pm

Heater wrote:
Wed Jun 09, 2021 8:26 am
However to this day "-Wall" does not enable many warnings one would like. You need a bunch of other W's for that. Even then many memory misuse errors will not give any warning. Some warning are not available with optimised release builds.
What I find amazing from the article is that since undefined behaviour can't occur in a properly written program, it is okay for the optimiser to simply remove any code in which it detects undefined behaviour.

Since bugs can be everywhere, how does one know whether the optimiser has a bug in the undefined behaviour detector or that the program actually contains undefined behaviour? It's better that the bugs be within reach. Unfortunately bugs in the compiler are out of reach for the average programmer to squash.

The point is not that bugs everywhere is something new, but that one of the main defenses C has against them is simplicity. By design the language features of C map directly onto the hardware in a way that a programmer can predict what kind of instructions will be generated from their code. In my opinion, this simplicity is lost when the compiler becomes so complex and clever that it starts analyzing whether the code makes sense and deleting or changing it if it doesn't.

To avoid bugs the compiler shouldn't analyze the whitespace, the formatting strings in printf nor should it have special provisions to decide if code contains undefined behaviour. All this complexity and extra features make it more likely the compiler itself will have bugs.

Until this thread I thought the analysis performed by the optimiser could serve a dual purpose and also warn the programmer of possible bugs in the code. However, the dead code elimination discussed in

viewtopic.php?p=1875537#p1875537

has convinced me otherwise. It's much better if the program that performs the static analysis used to check for bugs is separate from the compiler. Moreover, when surprising things are done for the sake of optimisation, that itself is a bug.

swampdog
Posts: 718
Joined: Fri Dec 04, 2015 11:22 am

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 3:30 pm

Fwiw, clang spits it (sumdiv) out with -Wall so all is not lost..

Code: Select all

foo@pi18:/wrk $ PATH=/usr/local/QT/5.15.0/bin:$PATH clang -o c -Wall -std=c11 -O3 c.c
c.c:11:4: warning: variable 's' is uninitialized when used here [-Wuninitialized]
                        s += i;
                        ^
c.c:6:7: note: initialize the variable 's' to silence this warning
 int s;
      ^
       = 0
1 warning generated.
..that's clang 11.0.0 btw.

lurk101
Posts: 665
Joined: Mon Jan 27, 2020 2:35 pm
Location: Cumming, GA (US)

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 4:47 pm

ejolson wrote:
Wed Jun 09, 2021 2:18 pm
Moreover, when surprising things are done for the sake of optimisation, that itself is a bug.
On this I'd have to disagree. I've seen surprisingly effective things done for the sake of optimization. Full link time -O3 optimization, bring it on!

I go with the premise that the compiler is trustworthy. In practical terms that has worked well for GCC, MSVC, and Keil compilers. The only genuine compiler bug I've ever encountered was for the Protel language back in 1983. It was an in house compiler (yes, there were such things back then!) Nortel used to program its telephony switches. Other than that every issue I've ever detected was either hardware or self-inflicted.

Sure, compilers are imperfect, but they are hardly to blame for bad code.
How to make your arguments stronger? Longer is not the answer.

User avatar
jahboater
Posts: 7158
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 7:14 pm

ejolson wrote:
Wed Jun 09, 2021 2:18 pm
To avoid bugs the compiler shouldn't analyze the whitespace, the formatting strings in printf nor should it have special provisions to decide if code contains undefined behaviour.
I find the printf/sprintf checks helpful. It checks the argument types and number correctly match the format string (a common cause of problems).
For sprintf, it checks the worst case length of the output will fit in the buffer, which I think is really clever. It deals with all the different floating-point formats, .* etc.

The thing about undefined behavior is that literally anything can happen, the code might work, it might fail, it might do nothing, it might crash, daemons might fly out of your nose, etc. So what should the compiler do? I personally would like it to produce a warning message and fail the compilation. But presumably since they are not syntax errors as such, the compiler usually elides the incorrect code because that is compatible with the "anything can happen" expectation.
Last edited by jahboater on Wed Jun 09, 2021 7:18 pm, edited 1 time in total.

User avatar
HermannSW
Posts: 4240
Joined: Fri Jul 22, 2016 9:09 pm
Location: Eberbach, Germany
Contact: Website Twitter YouTube

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 7:15 pm

lurk101 wrote:
Wed Jun 09, 2021 4:47 pm
..., but they are hardly to blame for bad code.
Just seen this related tweet ;-)
https://twitter.com/CompSciFact/status/ ... 5886420992
'Code is like poetry; most of it shouldn't have been written.' --
@MetaThis
https://stamm-wilbrandt.de/2wheel_balancing_robot
https://stamm-wilbrandt.de/en#raspcatbot
https://github.com/Hermann-SW/Raspberry_v1_camera_global_external_shutter
https://github.com/Hermann-SW/raspiraw
https://stamm-wilbrandt.de/en/Raspberry_camera.html

User avatar
Paeryn
Posts: 3305
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 7:35 pm

I've been looking into GCC's optimisation passes w.r.t. how it optimises sumdiv() and fails to warn about s being used uninitialised.
The way it expands the returns is to create an internal temporary that is either set to s or 47 and returns that temporary (it simplifies things so there is only one exit point). The constant propagation pass sees that s always has an undefined value at the point it is used when setting that temporary and since that is UB the compiler is allowed to use any value it wants, so the temporary variable being used for the return value has either the value 47 or whatever value the compiler wants, it chooses to use 47 as the value of s at this point which means the temporary always gets set to 47.

The dead code elimination then sees that it doesn't need to bother computing s at all and optimises it out of existence along with the if (0 == a % i) test (since it only computes a value that is never used). It still keeps the for loop at -O1 even though it now reads

Code: Select all

for (int i = 1; i <= a; i++) {
}
At -O2 it goes all the way and turns the whole function into just

Code: Select all

int sumdiv(int a)
{
  return 47;
}
It's a shame that GCC doesn't flag the warning but the way the optimisers work makes it nigh-on impossible to fix. It's not the optimisers' job to look for questionable code but GCC can only check for this by doing flow analysis which only gets done by the optimisers...

I wonder if this check could be added to the new static analyser? Not as convenient as it can take a long time on anything other than a trivial program. Clang will warn you, it also doesn't optimise s away even at -O3, it just uses whatever number was already in the register (or in memory) that was allocated to s.

A separate linter would do too, cppcheck seems ok, it even has a gui if you want for the reports.
She who travels light — forgot something.
Please note that my name doesn't start with the @ character so can people please stop writing it as if it does!

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 7:37 pm

ejolson wrote:
Wed Jun 09, 2021 2:18 pm
What I find amazing from the article is that since undefined behaviour can't occur in a properly written program, it is okay for the optimiser to simply remove any code in which it detects undefined behaviour.
This is indeed surprising. At least to those of us who have not thought about it much, like me. And most C/C++ programmers apparently.

We should not be surprised though. If we know our stuff we should know that the standard allows compilers to do that.

It's all about the "as if" rule. A commpiler can do what it likes as long as the results are as if it were compiling a correct program correctly. When the program is not correct, anyting goes. The optimizer guys make as good use of this rule as they can.
ejolson wrote:
Wed Jun 09, 2021 2:18 pm
Since bugs can be everywhere, how does one know whether the optimiser has a bug in the undefined behaviour detector or that the program actually contains undefined behaviour? It's better that the bugs be within reach. Unfortunately bugs in the compiler are out of reach for the average programmer to squash.
Yes , bugs are everywhere.

In this discussion I think it's better to concider only the language specification and assume compilers follow the rules perfectly. Otherwise we are in a world of chaos.
ejolson wrote:
Wed Jun 09, 2021 2:18 pm
The point is not that bugs everywhere is something new, but that one of the main defenses C has against them is simplicity.
Here I have to disagree. We all love C for it's simplicity. That may well make life easier for compiler writers but it is that very simplicity of the language that allows us to create so many overflow, out of bounds and memory use bugs.
ejolson wrote:
Wed Jun 09, 2021 2:18 pm
By design the language features of C map directly onto the hardware in a way that a programmer can predict what kind of instructions will be generated from their code.
I'm pretty sure that is how things were when C was first created. As far as I can tell it was the most basic level of abstraction over available hardware to raise it above assembler and allow portability of Unix to different architectures.

I don't think this has been true for a long time though. C is specified as if it were running on some primitive instruction set that does not exist. It has it's own "abstract machine". As a result optimized instructions generated from C code may not look like anything the programmer had in mind when he wrote it.
ejolson wrote:
Wed Jun 09, 2021 2:18 pm
In my opinion, this simplicity is lost when the compiler becomes so complex and clever that it starts analyzing whether the code makes sense and deleting or changing it if it doesn't.
I have that queezy feeling as well. But it has been a reality for a long time now.
ejolson wrote:
Wed Jun 09, 2021 2:18 pm
To avoid bugs the compiler shouldn't analyze the whitespace, the formatting strings in printf nor should it have special provisions to decide if code contains undefined behaviour. All this complexity and extra features make it more likely the compiler itself will have bugs.
Let's ignore hypthetical compiler bugs. Concider than language itself.

As far as I know C compilers do not care about white space indenting and such. The language standard does not specify that they should.

If compilers do happen to warn about indentation or other white space errors that is a bonus extra, a non-standard extension.
ejolson wrote:
Wed Jun 09, 2021 2:18 pm
Until this thread I thought the analysis performed by the optimiser could serve a dual purpose and also warn the programmer of possible bugs in the code.
Interestingly, some warnings that one migh turn on with obscure "-W" flags are actully not reported when one enables optimization like "-O3".

Go figure...
ejolson wrote:
Wed Jun 09, 2021 2:18 pm
Moreover, when surprising things are done for the sake of optimisation, that itself is a bug.
Not if those surprising thigs are allowd by the standard. I'm sure many of them are given that the standard is full of explanations of Undefined or Implementation Defined behavior.

If it's allowed in the standard compilers are allowed to do it. It's only surprising to the programmer who does not know the standard inside out. That is to say, most of us.
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 7158
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 8:02 pm

Heater,
Such a well written post is hard to comment on.
Heater wrote:
Wed Jun 09, 2021 7:37 pm
As a result optimized instructions generated from C code may not look like anything the programmer had in mind when he wrote it.
This possible large difference between the abstract machine and the implementation is what gives C its speed.

Some people complain when printf("Hello, world\n"); is replaced by the compiler with puts("Hello, world");
But why? The result is identical, which is all that matters.

User avatar
jahboater
Posts: 7158
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 8:17 pm

Heater wrote:
Wed Jun 09, 2021 7:37 pm
Here I have to disagree. We all love C for it's simplicity. That may well make life easier for compiler writers but it is that very simplicity of the language that allows us to create so many overflow, out of bounds and memory use bugs.
Interesting point.

The language cannot mandate everything.
Long ago we had more than one possible integer format.
The language C was designed to be usable on any hardware, so the C could not just state "signed integer overflow results in xxx happening". Instead it is left as "undefined behavior".

What else could they do for a reasonably sized, fast, portable language?

Now that twos-complement is ubiquitous, it is easy for D and Rust and other modern languages to say that signed integer overflow "wraps" - they have removed one type of UB. These new languages don't have 50 years of historical burden!
I bet Fortran has similar issues.

C is also designed to run on tiny or very low powered computers.
Run time bounds checking or tracked memory allocation are too costly for that.

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 8:30 pm

jahboater wrote:
Wed Jun 09, 2021 8:02 pm
Such a well written post is hard to comment on.
Wow. Thank you for that.
jahboater wrote:
Wed Jun 09, 2021 8:02 pm

This possible large difference between the abstract machine and the implementation is what gives C its speed.
Some people complain when printf("Hello, world\n"); is replaced by the compiler with puts("Hello, world");
But why? The result is identical, which is all that matters.
Exactly, it's all about that "as if" rule. The compiler is free to do what it likes, as long as it produces the same result as if it did it the way the programmer thought, and as long as there is no UB in the execution.

It's a curious thing but I have noticed that many old time C programmers talk of this "close to how the hardware works" and predicatilty of of the generated code. Even though none of that has been true for a couple of decades or more.

It's even worse than that, modern processors are busy rearranging our code to run instructions in parallel or out of order or whatever.

Nothing is what it seems.... unless you stick to the standard and avoid UB.
Memory in C++ is a leaky abstraction .

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Wed Jun 09, 2021 9:02 pm

jahboater wrote:
Wed Jun 09, 2021 8:17 pm
Heater wrote:
Wed Jun 09, 2021 7:37 pm
Here I have to disagree. We all love C for it's simplicity. That may well make life easier for compiler writers but it is that very simplicity of the language that allows us to create so many overflow, out of bounds and memory use bugs.
Interesting point.

The language cannot mandate everything.
Long ago we had more than one possible integer format.
The language C was designed to be usable on any hardware, so the C could not just state "signed integer overflow results in xxx happening". Instead it is left as "undefined behavior".

What else could they do for a reasonably sized, fast, portable language?
I totally agree with all that. Mostly. C is a masterpiece. It was created to make Unix portable. Indeed it sprouted features, for example "stuct"s when it was realized they would help acheive that goal. It had to wok within the constraints of the time. Almost no memory, slow processors and so on. It had to abstract away the bare minimum of hardware common to many different architectures so that it could be ported. Oh, and a couple of guys had to be able to write an actual compiler for it in a reasonable time. Brilliant!
jahboater wrote:
Wed Jun 09, 2021 8:17 pm
Now that twos-complement is ubiquitous, it is easy for D and Rust and other modern languages to say that signed integer overflow "wraps" - they have removed one type of UB. These new languages don't have 50 years of historical burden!
They also have learned from 40 years of programming language and compiler construction reasearch.

It's not really about signed interger overflow and whatever hardware support such things get from the processor. It's about being able to specify a type, the operations on it, and the results those operations should produce, in the language.
Heater wrote:
Wed Jun 09, 2021 7:37 pm
C is also designed to run on tiny or very low powered computers.
Run time bounds checking or tracked memory allocation are too costly for that.
If we take Rust as an example of what can be done with modern language/compiler technology we find some interesting things:

1) There is no extra code added to track memory allocation and useage at run time. It's all checked at compile time.

2) Pretty much all arrays bounds checking can be avoided.

3) I do wonder why it ever happened that processor hardware does not crap out with an exception/trap/interrupt whatever when overflow occurs? Like they do for divide by zero.

Anyway when it comes to running in confined spaces check out the work of Cliff Biffle. He wrote a bit banged VGA graphics demo for some STM32 MCU in C++. Then he did it again in Rust.
http://cliffle.com/blog/m4vga-in-rust/

The, perhaps surprising, result being that:
The Rust implementation is simpler, shorter (in lines of code), faster, and smaller (in bytes of Flash) than my heavily-optimized C++ version — and because it’s almost entirely safe code, several types of bugs that I fought regularly, such as race conditions and dangling pointers, are now caught by the compiler.
Memory in C++ is a leaky abstraction .

ejolson
Posts: 7501
Joined: Tue Mar 18, 2014 11:47 am

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 12:37 am

Heater wrote:
Wed Jun 09, 2021 9:02 pm
3) I do wonder why it ever happened that processor hardware does not crap out with an exception/trap/interrupt whatever when overflow occurs? Like they do for divide by zero.
I think the reason integer overflow with wrap around does not trigger an exception is because that corresponds to the well defined algebraic operations on the ring of integers modulo 2^n with, for example, n=32. Note that the wrap around in such rings is useful in cryptography, random number generators and the creation hash functions among other things.

On the other hand, allowing divide by zero trends to lead not to any useful type of algebraic structure. That's why dividing by zero is often flagged as an exception.
Last edited by ejolson on Thu Jun 10, 2021 3:03 am, edited 3 times in total.

User avatar
Gavinmc42
Posts: 5817
Joined: Wed Aug 28, 2013 3:31 am

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 1:47 am

I avoid the C++ box, it is so big it is very hard to find that cat.

After avoiding C++ for 40 or so years I see no reason to use it except on Arduinos ;)
Rust and D seem a better choice if I had to pick

CircuitPython works on Pi's ;)
Those libraries make test app writing a few dozen to a hundred or so lines.
Anyone who writes library code is a better coder than me anyway.
Plus library code get used by more people and is probably debugged better.

What is the error per lines of code rate per language?

If it is so hard to predict "undefined" code why not have a quantum language that has "undefined" as part of the language.
Yes, No, Maybe?
I'm dancing on Rainbows.
Raspberries are not Apples or Oranges

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 4:46 am

ejolson wrote:
Thu Jun 10, 2021 12:37 am
I think the reason integer overflow with wrap around does not trigger an exception is because that corresponds to the well defined algebraic operations on the ring of integers modulo 2^n with, for example, n=32. Note that the wrap around in such rings is useful in cryptography, random number generators and the creation hash functions among other things.
That is indeed very useful property of most machines integer arithmetic.

Ideally a programming language would have different types, a type for something that works like that ring of integers and a type for something that causes a trap/interrupt when asked to produce a normal arithmetic operation result that it cannot represent. These types would need hardware support to maintain performace, that is an instruction for addition module 2^n and a different instruction for regular addition that detects overflow.

It occurs to me though that another reason for not implementing this in hardware is that it leads down a rabbit hole. What if languages allowed definition of types of integer that are restricted to some limited range of values? And integer from 42 to 69 say? Then we could ask for the hardware to check upper and lower bounds. What about a type that should only hold even numbers? And so on. All of which starts to get a bit too much to cast into hardware. The only language I know that allows for definition of such types, and checks them, is Ada.
Memory in C++ is a leaky abstraction .

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 5:04 am

Gavinmc42 wrote:
Thu Jun 10, 2021 1:47 am
I avoid the C++ box, it is so big it is very hard to find that cat.
Ha, I get it, very good.
Gavinmc42 wrote:
Thu Jun 10, 2021 1:47 am
After avoiding C++ for 40 or so years I see no reason to use it except on Arduinos ;)
It's worth using C++ just to see the look of anguish on C++ purists faces when they see you using it as just "C with classes". Using raw pointers everywhere. Using printf instead of "cout" (Which is terribly slow). And so on.

Oh, and better yet don't use "class". Make everything a "struct". Who needs private members anyway?
Gavinmc42 wrote:
Thu Jun 10, 2021 1:47 am
If it is so hard to predict "undefined" code why not have a quantum language that has "undefined" as part of the language.
No need. "Undefine behavior" occurs hundreds of times in the C and C++ standards documents already.
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 7158
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 6:07 am

Heater wrote:
Thu Jun 10, 2021 5:04 am
Gavinmc42 wrote:
Thu Jun 10, 2021 1:47 am
If it is so hard to predict "undefined" code why not have a quantum language that has "undefined" as part of the language.
No need. "Undefined behavior" occurs hundreds of times in the C and C++ standards documents already.
There is a definitive list in Annex J (section J.2, page 406) of the C18 ISO standard document.

Its quite a long list, too long to remember, but you will never meet 99.9% of them.

User avatar
jahboater
Posts: 7158
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 6:18 am

ejolson wrote:
Thu Jun 10, 2021 12:37 am
I think the reason integer overflow with wrap around does not trigger an exception is because that corresponds to the well defined algebraic operations on the ring of integers modulo 2^n with, for example, n=32. Note that the wrap around in such rings is useful in cryptography, random number generators and the creation hash functions among other things.
Operations that do not have a valid integer result will produce an exception ( INT_MIN / -1, divide by zero and so on).
The hardware usually uses the floating point exception flags.

To ensure an exception is raised for the other add/subtract/multiply overflows, use the -ftrapv compiler flag.
Obviously this has a cost in performance.

Alternatively -fwrapv tells the compiler that overflow behavior is "defined" (twos-complement wrap) and to optimize accordingly.
I think this last option is the most useful because C++ and eventually C will adopt this behavior in the ISO standards soon.
(because there is no known hardware that is not two's-complement).

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 6:58 pm

jahboater wrote:
Thu Jun 10, 2021 6:18 am
Operations that do not have a valid integer result will produce an exception ( INT_MIN / -1, divide by zero and so on).
I see it differently. If my code has:

Code: Select all

    x = y + z;
But that sum is too big for the size of integer then I consider that as not being able to produce a valid integer result. It may well wrap in the familiar way or be undefined or something else. But it is some number or other that is not the valid integer result of what I wrote.

So, if overflow can do whatever it likes, essentially producing a random number, without any exception then when not divide by zero?
jahboater wrote:
Thu Jun 10, 2021 6:18 am
The hardware usually uses the floating point exception flags.
Perhaps on a PC or whatever. Most of the software I have ever written runs on machines that don't have floating point support in hardware.
jahboater wrote:
Thu Jun 10, 2021 6:18 am
To ensure an exception is raised for the other add/subtract/multiply overflows, use the -ftrapv compiler flag.
Obviously this has a cost in performance.
Very good. But that is not part of the C language definition. Only a convenience of the GCC compiler and probably Clang and others.

It's also kind of useless if sometimes I want "normal" wrapping behaviour in my code but some times not. Types and their operations should be part of the language syntax/semantics, not changed by external, non-standard, non-portable, compiler options.
Memory in C++ is a leaky abstraction .

User avatar
jahboater
Posts: 7158
Joined: Wed Feb 04, 2015 6:38 pm
Location: Wonderful West Dorset

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 7:16 pm

Heater wrote:
Thu Jun 10, 2021 6:58 pm

Code: Select all

    x = y + z;
But that sum is too big for the size of integer then I consider that as not being able to produce a valid integer result. It may well wrap in the familiar way or be undefined or something else. But it is some number or other that is not the valid integer result of what I wrote.
These are two different things.

Code: Select all

    x = y / 0;
There is no integer representation for infinity.

Code: Select all

    x = y + z;
As far as the hardware is concerned there is a perfectly valid representation for this result on all known computers.
The C++20 standard and hopefully C20 will recognize that and it will no longer be UB.
It was only ever considered UB because very old computers didn't necessarily use twos-complement, so the language could not state with certainty what the result must be, but now it can.

That result (wrap) may not be what you want, in which case add some obvious checks or use 64-bits. On the other hand as ejolson pointed out, there are a number of important cases where it is wanted, such as cryptography and random numbers. The LCG uses it.

Decent hardware might raise an overflow flag, but don't rely on it - ARM only does in limited cases, Intel always does, RISCV does not :)

Heater
Posts: 18344
Joined: Tue Jul 17, 2012 3:02 pm

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 7:52 pm

jahboater wrote:
Thu Jun 10, 2021 7:16 pm
There is no integer representation for infinity.
Very true. Unless our machines had some means of representing integer infinity as opposed to any other value. Like we do with floats.

I think we are talking at cross-purposes here.

On the one hand there is the notion on an "integer" in mathematics. Which as far as I understand start at 0 and go though 1, 2, 4, 4 up to some frikken huge number we can't get to, generally called infinity.

On the other hand we have the C language and others that have number type called "int".

Thing is the "int" type does not represent "integers". Despite it's name. Only a teeny-tiny subset of them.

Not only that the "int" type in C does not behave like an integer. As we see when an addition overflows and produces an incorrect integer result.

Yes of course we can define the result of such an addition as wrapping or "undefined" and hence producing a correct "int" result. But then I would say the type should not be called "int" which suggests integer.

It gets worse. What number range does that "int" handle? 16, 32, 64 bits? Other? From reading the code one cannot tell.

Hopefully nobody uses "int" anymore. Rather specify what they mean with int32_t or whatever.
Memory in C++ is a leaky abstraction .

User avatar
Paeryn
Posts: 3305
Joined: Wed Nov 23, 2011 1:10 am
Location: Sheffield, England

Re: Schrödinger's Code - Undefined behavior in theory and practice

Thu Jun 10, 2021 8:03 pm

jahboater wrote:
Thu Jun 10, 2021 7:16 pm

Code: Select all

    x = y / 0;
There is no integer representation for infinity.
It doesn't matter that there is no integer representation of infinity since y / 0 doesn't equal infinity. Assuming 5 / 0 did equal inf. then reversing it you'd add up an infinite number of zeros but you'll find that you're still 5 short. Dividing by zero is mathematically undefined, you can't do it, C gets this right in saying it's UB!
She who travels light — forgot something.
Please note that my name doesn't start with the @ character so can people please stop writing it as if it does!

Return to “C/C++”