What is the difference between different optimization levels in GCC? Assuming I don’t care to have any debug hooks, why wouldn’t I just use the highest level of optimization available to me? does a higher level of optimization necessarily (i.e. provably) generate a faster program?
Yes, a higher level can sometimes mean a better performing program. However, it can cause problems depending on your code. For example, branch prediction (enabled in -O1 and up) can break poorly written multi threading programs by causing a race condition. Optimization will actually decide something that’s better than what you wrote, which in some cases might not work.
And sometimes, the higher optimizations (-O3) add no reasonable benefit but a lot of extra size. Your own testing can determine if this size tradeoff makes a reasonable performance gain for your system.
As a final note, the GNU project compiles all of their programs at -O2 by default, and -O2 is fairly common elsewhere.
Generally optimization levels higher than
-O3 for gcc but other compilers have higher ones) include optimizations that can increase the size of your code. This includes things like loop unrolling, lots of inlining, padding for alignment regardless of size, etc. Other compilers offer vectorization and inter-procedural optimization at levels higher than
-O3, as well as certain optimizations that can improve speed a lot at the cost of correctness (e.g., using faster, less accurate math routines). Check the docs before you use these things.
As for performance, it’s a tradeoff. In general, compiler designers try to tune these things so that they don’t decrease the performance of your code, so
-O3 will usually help (at least in my experience) but your mileage may vary. It’s not always the case that really aggressive size-altering optimizations will improve performance (e.g. really aggressive inlining can get you cache pollution).
I found a web page containing some information about the different optimization levels. One thing a remember hearing somewhere is that optimization might actually break your program and that can be an issue. But I’m not sure how much of a an issue that is any longer. Perhaps todays compilers are smart enough to handle those problems.
It’s quite hard to predict exactly what flags are turned on by the global
-O directives on the gcc command line for different versions and platforms, and all documentation on the GCC site is likely to become outdated quickly or doesn’t cover the compiler internals in enough detail.
Here is an easy way to check exactly what happens on your particular setup when you use one of the
-O flags and other
-f flags and/or combinations thereof:
- Create an empty source file somewhere:
- Run it though the compiler pass just as you normally would, with all
-mflags you would normally use, but adding
-Q -vto the command line:
gcc -c -Q -v dummy.c
- Inspect the generated output, perhaps saving it for different run.
- Change the command line to your liking, remove the generated object file via
rm -f dummy.oand re-run.
Also, always keep in mind that, from a purist point of view, most non-trivial optimizations generate “broken” code (where broken is defined as deviating from the optimal path in corner cases), so choosing whether or not to enable a certain set of optimization mechanisms sometimes boils down to choosing the level of correctness for the compiler output. There always have (and currently are) bugs in any compiler’s optimizer – just check the GCC mailing list and Bugzilla for some samples. Compiler optimization should only be used after actually performing measurements since
- gains from using a better algorithm will dwarf any gains from compiler optimization,
- there is no point in optimizing code that will run every once in a blue moon,
- if the optimizer introduces bugs, it’s immaterial how fast your code runs.