Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
If it's happening I believe it may be because O3 just enables more aggressive optimizations, it doesn't guarantee better performance.
@Fast-Nop do you have a code example which shows this? Are you running the latest GCC? Are you compiling for x86 or ARM? What's the code size (could be missing in caches which would cause the slowdowns) vs the target's memory/icache size? In the O3 assembly does it use fancier instructions that can sometimes be worse because hardware optimizes common cases?
Not only does your post lack a severe comprehension of what the O... flags do...
But even more: You don't seem to understand what compiler optimization means
@RememberMe Can't provide a code example, but the observation that O3 produces both larger and slower binaries seems to be pretty common.
@IntrusionCM Listen kid, I fucking OBVIOUSLY know what the different O levels do because the fucking optimisation documentation of GCC lists at high detail level which fucking optimisations are enabled by default at which fucking O level.
That doesn't change the fact that "optimisations" that produce both larger AND slower binaries are simply broken and fucking pointless pieces of shit.
@Fast-Nop optimizations are guided by general heuristics, not all of them are guaranteed to work all the time.
You could try to isolate which optimization pass is making things worse by running gcc at O2 with O3's flags enabled one by one and profiling via a script.
Though passes could have dependencies so that might not give you much data.
I've seen this once before where the compiler generated optimized code that was worse for the target's branch predictor but okay in general. So yeah.
@RememberMe And yeah, of course larger binaries are more in danger of fucking up the cache, but only enabling O3 on a few selected hotspots (which I know from profiling) didn't increase the binary size noticeably. It only decreased the performance because it doesn't only produce larger, but actually worse machine code.
Probably because most folks use O2 for production code, and that's because O3 used to be actually broken, and now that nobody uses O3, the GCC folks don't care either, and I guess it might give faster results on fossilised CPUs found in some ancient Egyptian tombs or so.
Also, I rather decided to go the opposite way and went from O2 because that was the better baseline, and checked which additional options yielded more performance. Makes more sense than starting from a broken baseline.
All O flags are just a combination of several flags.
"Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program. "
Simply put: Read what the flags do, check your source code, and don't apply blindly O3. And the GCC documentation is imho very precise and clear.
Most of the time, when you reaaaallly want to try to optimize, you should first take a look at march / mcpu, then enable O2, then look at your source code.
There is no magic bullet here.
LTO is worth investigating, too - simply (and not fully correct) it tries to improve the heuristic guesswork by analyzing the linked binary and then trying to optimize.
MagicSowap29331dHave you used -march and/or -mtune ?
-O3 triggers more aggressive optimisation, but if the compiler doesn't know on what the code will run it can't do its job properly.
So maybe your code with -O3 is actually faster on a Pentium III
Fast-Nop2498431d@MagicSowap O3 isn't really used in production because it doesn't work, and because it isn't used, it doesn't get improved. The underlying issue is that it makes the code larger and introduces more branches to save some instruction executions, but that fucks up CPU branch prediction. The whole notion of "more aggressive optimisation" in O3 has become nonsense because it isn't in line with how today's CPUs work.