Everyone wants faster programs, so doing more optimisations with GCC at -O3 instead of -O2 makes the program quite a bit larger, but... SLOWER. Makes sense, right? Why do you even have -O3 if it generates larger AND slower binaries than -O2?

Ah IC, it's because you use that level only on individual hot functions, not on the full program. How do I do that? Function attribute for optimisation. Cool. Uhm, what is the exact syntax? The fucking GCC documentation doesn't say that. When will devs finally learn to give bloody EXAMPLES?!

Googling around. Ah, with quotes, but without the leading hyphen it seems. Copy/paste. Compile again, tadaa: it's only a little bit but still FUCKING SLOWER than -O2!

GCC's -O3 is like that stupid kid at McD that ate like a damn horse, had to vomit afterwards and was even more hungry than before!

  • 3
    If it's happening I believe it may be because O3 just enables more aggressive optimizations, it doesn't guarantee better performance.

    Well, TIL.
  • 1
    @Fast-Nop do you have a code example which shows this? Are you running the latest GCC? Are you compiling for x86 or ARM? What's the code size (could be missing in caches which would cause the slowdowns) vs the target's memory/icache size? In the O3 assembly does it use fancier instructions that can sometimes be worse because hardware optimizes common cases?
  • 2
    Not only does your post lack a severe comprehension of what the O... flags do...

    But even more: You don't seem to understand what compiler optimization means
  • 0
    @IntrusionCM could you explain? It seems to be a fairly straightforward case of "using a different optimization level provides different results".
  • 3
    @RememberMe Can't provide a code example, but the observation that O3 produces both larger and slower binaries seems to be pretty common.

    @IntrusionCM Listen kid, I fucking OBVIOUSLY know what the different O levels do because the fucking optimisation documentation of GCC lists at high detail level which fucking optimisations are enabled by default at which fucking O level.

    That doesn't change the fact that "optimisations" that produce both larger AND slower binaries are simply broken and fucking pointless pieces of shit.
  • 3
    @Fast-Nop optimizations are guided by general heuristics, not all of them are guaranteed to work all the time.

    You could try to isolate which optimization pass is making things worse by running gcc at O2 with O3's flags enabled one by one and profiling via a script.

    Though passes could have dependencies so that might not give you much data.

    I've seen this once before where the compiler generated optimized code that was worse for the target's branch predictor but okay in general. So yeah.
  • 2
    @RememberMe And yeah, of course larger binaries are more in danger of fucking up the cache, but only enabling O3 on a few selected hotspots (which I know from profiling) didn't increase the binary size noticeably. It only decreased the performance because it doesn't only produce larger, but actually worse machine code.

    Probably because most folks use O2 for production code, and that's because O3 used to be actually broken, and now that nobody uses O3, the GCC folks don't care either, and I guess it might give faster results on fossilised CPUs found in some ancient Egyptian tombs or so.

    Also, I rather decided to go the opposite way and went from O2 because that was the better baseline, and checked which additional options yielded more performance. Makes more sense than starting from a broken baseline.
  • 1
    @RememberMe Yes.


    All O flags are just a combination of several flags.

    "Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program. "

    Simply put: Read what the flags do, check your source code, and don't apply blindly O3. And the GCC documentation is imho very precise and clear.

    Most of the time, when you reaaaallly want to try to optimize, you should first take a look at march / mcpu, then enable O2, then look at your source code.

    There is no magic bullet here.

    LTO is worth investigating, too - simply (and not fully correct) it tries to improve the heuristic guesswork by analyzing the linked binary and then trying to optimize.
  • 0
    @Fast-Nop that *should* not be the case, because O3 has some optimizations that can really help things (eg. Loop interchange, SLP vectorization).

    It's not for everything and should always be profiled though yeah.
  • 2
    @Fast-Nop sorry. But your rant reads a bit Like: Didn't know what O3 does. Didn't know anything. GCC sucks.
  • 0
    @IntrusionCM At points where I really care, I go as far to let GCC even generate the assembly listing so that I can check what's actually going on.

    And yeah, LTO gives a nice boost, although I would never recommend that for embedded systems.
  • 1
    Have you used -march and/or -mtune ?
    -O3 triggers more aggressive optimisation, but if the compiler doesn't know on what the code will run it can't do its job properly.
    So maybe your code with -O3 is actually faster on a Pentium III
  • 0
    @MagicSowap O3 isn't really used in production because it doesn't work, and because it isn't used, it doesn't get improved. The underlying issue is that it makes the code larger and introduces more branches to save some instruction executions, but that fucks up CPU branch prediction. The whole notion of "more aggressive optimisation" in O3 has become nonsense because it isn't in line with how today's CPUs work.
Add Comment