Home » C++ » Is it legal for a C++ optimizer to reorder calls to clock()?

Is it legal for a C++ optimizer to reorder calls to clock()?

Posted by: admin November 29, 2017 Leave a comment

Questions:

The C++ Programming Language 4th edition, page 225 reads: A compiler may reorder code to improve performance as long as the result is identical to that of the simple order of execution. Some compilers, e.g. Visual C++ in release mode, will reorder this code:

#include <time.h>
...
auto t0 = clock();
auto r  = veryLongComputation();
auto t1 = clock();

std::cout << r << "  time: " << t1-t0 << endl;

into this form:

auto t0 = clock();
auto t1 = clock();
auto r  = veryLongComputation();

std::cout << r << "  time: " << t1-t0 << endl;

which guarantees different result than original code (zero vs. greater than zero time reported). See my other question for detailed example. Is this behavior compliant with the C++ standard?

Answers:

The compiler cannot exchange the two clock calls. t1 must be set after t0. Both calls are observable side effects. The compiler may reorder anything between those observable effects, and even over an observable side effect, as long as the observations are consistent with possible observations of an abstract machine.

Since the C++ abstract machine is not formally restricted to finite speeds, it could execute veryLongComputation() in zero time. Execution time itself is not defined as an observable effect. Real implementations may match that.

Mind you, a lot of this answer depends on the C++ standard not imposing restrictions on compilers.

Questions:
Answers:

Well, there is something called Subclause 5.1.2.3 of the C Standard [ISO/IEC 9899:2011] which states:

In the abstract machine, all expressions are evaluated as specified by
the semantics. An actual implementation need not evaluate part of an
expression if it can deduce that its value is not used and that no
needed side effects are produced (including any caused by calling a
function or accessing a volatile object).

Therefore I really suspect that this behaviour – the one you described – is compliant with the standard.

Furthermore – the reorganization indeed has an impact on the computation result, but if you look at it from compiler perspective – it lives in the int main() world and when doing time measurements – it peeps out, asks the kernel to give it the current time, and goes back into the main world where the actual time of the outside world doesn’t really matter. The clock() itself won’t affect the program and variables and program behaviour won’t affect that clock() function.

The clocks values are used to calculate difference between them – that is what you asked for. If there is something going on, between the two measuring, is not relevant from compilers perspective since what you asked for was clock difference and the code between the measuring won’t affect the measuring as a process.

This however doesn’t change the fact that the described behaviour is very unpleasant.

Even though inaccurate measurements are unpleasant, it could get much more worse and even dangerous.

Consider the following code taken from this site:

void GetData(char *MFAddr) {
    char pwd[64];
    if (GetPasswordFromUser(pwd, sizeof(pwd))) {
        if (ConnectToMainframe(MFAddr, pwd)) {
              // Interaction with mainframe
        }
    }
    memset(pwd, 0, sizeof(pwd));
}

When compiled normally, everything is OK, but if optimizations are applied, the memset call will be optimized out which may result in a serious security flaw. Why does it get optimized out? It is very simple; the compiler again thinks in its main() world and considers the memset to be a dead store since the variable pwd is not used afterwards and won’t affect the program itself.

Questions:
Answers:

Yes, it is legal – if the compiler can see the entirety of the code that occurs between the clock() calls.

Questions:
Answers:

If veryLongComputation() internally performs any opaque function call, then no, because the compiler cannot guarantee that its side effects would be interchangeable with those of clock().

Otherwise, yes, it is interchangeable.
This is the price you pay for using a language in which time isn’t a first-class entity.

Note that memory allocation (such as new) can fall in this category, as allocation function can be defined in a different translation unit and not compiled until the current translation unit is already compiled. So, if you merely allocate memory, the compiler is forced to treat the allocation and deallocation as worst-case barriers for everything — clock(), memory barriers, and everything else — unless it already has the code for the memory allocator and can prove that this is not necessary. In practice I don’t think any compiler actually looks at the allocator code to try to prove this, so these types of function calls serve as barriers in practice.

Questions:
Answers:

At least by my reading, no, this is not allowed. The requirement from the standard is (§1.9/14):

Every value computation and side effect associated with a full-expression is sequenced before every value computation and side effect associated with the next full-expression to be evaluated.

The degree to which the compiler is free to reorder beyond that is defined by the “as-if” rule (§1.9/1):

This International Standard places no requirement on the structure of conforming implementations.
In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming
implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.

That leaves the question of whether the behavior in question (the output written by cout) is officially observable behavior. The short answer is that yes, it is (§1.9/8):

The least requirements on a conforming implementation are:
[…] — At program termination, all data written into files shall be identical to one of the possible results that execution of the program according to the abstract semantics would have produced.

At least as I read it, that means the calls to clock could be rearranged compared to the execution of your long computation if and only if it still produced identical output to executing the calls in order.

If, however, you wanted to take extra steps to ensure correct behavior, you could take advantage of one other provision (also §1.9/8):

— Access to volatile objects are evaluated strictly according to the rules of the abstract machine.

To take advantage of this, you’d modify your code slightly to become something like:

auto volatile t0 = clock();
auto volatile r  = veryLongComputation();
auto volatile t1 = clock();

Now, instead of having to base the conclusion on three separate sections of the standard, and still having only a fairly certain answer, we can look at exactly one sentence, and have an absolutely certain answer–with this code, re-ordering uses of clock vs., the long computation is clearly prohibited.

Questions:
Answers:

Let’s suppose that the sequence is in a loop, and the veryLongComputation () randomly throws an exception. Then how many t0s and t1s will be calculated? Does it pre-calculate the random variables and reorder based on the precalculation – sometimes reordering and sometimes not?

Is the compiler smart enough to know that just a memory read is a read from shared memory. The read is a measure of how far the control rods have moved in a nuclear reactor. The clock calls are used to control the speed at which they are moved.

Or maybe the timing is controlling the grinding of a Hubble telescope mirror. LOL

Moving clock calls around seems too dangerous to leave to the decisions of compiler writers. So if it is legal, perhaps the standard is flawed.

IMO.

Questions:
Answers:

It is certainly not allowed, since it changes, as you have noted, the observeable behavior (different output) of the program (I won’t go into the hypothetical case that veryLongComputation() might not consume any measurable time — given the function’s name, is presumably not the case. But even if that was the case, it wouldn’t really matter). You wouldn’t expect that it is allowable to reorder fopen and fwrite, would you.

Both t0 and t1 are used in outputting t1-t0. Therefore, the initializer expressions for both t0 and t1 must be executed, and doing so must follow all standard rules. The result of the function is used, so it is not possible to optimize out the function call, though it doesn’t directly depend on t1 or vice versa, so one might naively be inclined to think that it’s legal to move it around, why not. Maybe after the initialization of t1, which doesn’t depend on the calculation?
Indirectly, however, the result of t1 does of course depend on side effects by veryLongComputation() (notably the computation taking time, if nothing else), which is exactly one of the reasons that there exist such a thing as “sequence point”.

There are three “end of expression” sequence points (plus three “end of function” and “end of initializer” SPs), and at every sequence point it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed.
There is no way you can keep this promise if you move around the three statements, since the possible side effects of all functions called are not known. The compiler is only allowed to optimize if it can guarantee that it will keep the promise up. It can’t, since the library functions are opaque, their code isn’t available (nor is the code within veryLongComputation, necessarily known in that translation unit).

Compilers do however sometimes have “special knowledge” about library functions, such as some functions will not return or may return twice (think exit or setjmp).
However, since every non-empty, non-trivial function (and veryLongComputation is quite non-trivial from its name) will consume time, a compiler having “special knowledge” about the otherwise opaque clock library function would in fact have to be explicitly disallowed from reordering calls around this one, knowing that doing so not only may, but will affect the results.

Now the interesting question is why does the compiler do this anyway? I can think of two possibilities. Maybe your code triggers a “looks like benchmark” heuristic and the compiler is trying to cheat, who knows. It wouldn’t be the first time (think SPEC2000/179.art, or SunSpider for two historic examples). The other possibility would be that somewhere inside veryLongComputation(), you inadvertedly invoke undefined behavior. In that case, the compiler’s behavior would even be legal.