gcc

Ranter

Fast-Nop

36813

Comments

2

RememberMe

13709

5y

If it's happening I believe it may be because O3 just enables more aggressive optimizations, it doesn't guarantee better performance.

Well, TIL.
1

RememberMe

13709

5y

@Fast-Nop do you have a code example which shows this? Are you running the latest GCC? Are you compiling for x86 or ARM? What's the code size (could be missing in caches which would cause the slowdowns) vs the target's memory/icache size? In the O3 assembly does it use fancier instructions that can sometimes be worse because hardware optimizes common cases?
1

IntrusionCM

13947

5y

Not only does your post lack a severe comprehension of what the O... flags do...

But even more: You don't seem to understand what compiler optimization means
0

RememberMe

13709

5y

@IntrusionCM could you explain? It seems to be a fairly straightforward case of "using a different optimization level provides different results".
2

Fast-Nop

36813

5y

@RememberMe Can't provide a code example, but the observation that O3 produces both larger and slower binaries seems to be pretty common.

@IntrusionCM Listen kid, I fucking OBVIOUSLY know what the different O levels do because the fucking optimisation documentation of GCC lists at high detail level which fucking optimisations are enabled by default at which fucking O level.

That doesn't change the fact that "optimisations" that produce both larger AND slower binaries are simply broken and fucking pointless pieces of shit.
2

RememberMe

13709

5y

@Fast-Nop optimizations are guided by general heuristics, not all of them are guaranteed to work all the time.

You could try to isolate which optimization pass is making things worse by running gcc at O2 with O3's flags enabled one by one and profiling via a script.

Though passes could have dependencies so that might not give you much data.

I've seen this once before where the compiler generated optimized code that was worse for the target's branch predictor but okay in general. So yeah.
1

Fast-Nop

36813

5y

@RememberMe And yeah, of course larger binaries are more in danger of fucking up the cache, but only enabling O3 on a few selected hotspots (which I know from profiling) didn't increase the binary size noticeably. It only decreased the performance because it doesn't only produce larger, but actually worse machine code.

Probably because most folks use O2 for production code, and that's because O3 used to be actually broken, and now that nobody uses O3, the GCC folks don't care either, and I guess it might give faster results on fossilised CPUs found in some ancient Egyptian tombs or so.

Also, I rather decided to go the opposite way and went from O2 because that was the better baseline, and checked which additional options yielded more performance. Makes more sense than starting from a broken baseline.
1

IntrusionCM

13947

5y

@RememberMe Yes.

https://gcc.gnu.org/onlinedocs/gcc/...

All O flags are just a combination of several flags.

Quote:
"Turning on optimization flags makes the compiler attempt to improve the performance and/or code size at the expense of compilation time and possibly the ability to debug the program. "

Simply put: Read what the flags do, check your source code, and don't apply blindly O3. And the GCC documentation is imho very precise and clear.

Most of the time, when you reaaaallly want to try to optimize, you should first take a look at march / mcpu, then enable O2, then look at your source code.

There is no magic bullet here.

LTO is worth investigating, too - simply (and not fully correct) it tries to improve the heuristic guesswork by analyzing the linked binary and then trying to optimize.
0

RememberMe

13709

5y

@Fast-Nop that *should* not be the case, because O3 has some optimizations that can really help things (eg. Loop interchange, SLP vectorization).

It's not for everything and should always be profiled though yeah.
2

IntrusionCM

13947

5y

@Fast-Nop sorry. But your rant reads a bit Like: Didn't know what O3 does. Didn't know anything. GCC sucks.
0

Fast-Nop

36813

5y

@IntrusionCM At points where I really care, I go as far to let GCC even generate the assembly listing so that I can check what's actually going on.

And yeah, LTO gives a nice boost, although I would never recommend that for embedded systems.
1

MagicSowap

275

5y

Have you used -march and/or -mtune ?
-O3 triggers more aggressive optimisation, but if the compiler doesn't know on what the code will run it can't do its job properly.
So maybe your code with -O3 is actually faster on a Pentium III
0

Fast-Nop

36813

5y

@MagicSowap O3 isn't really used in production because it doesn't work, and because it isn't used, it doesn't get improved. The underlying issue is that it makes the code larger and introduces more branches to save some instruction executions, but that fucks up CPU branch prediction. The whole notion of "more aggressive optimisation" in O3 has become nonsense because it isn't in line with how today's CPUs work.

Related Rants

Add Comment

rant

retarded mole