16
pk76
4y

I spent four days doing a rewrite for a possible performance boost that yielded nothing.

I spent an hour this morning implementing something that boosted parsing of massive files by 22% and eliminated memory allocations during parsing.

Work effort does not translate into gains.

Comments
  • 3
    Come on. Use a profiler ffs!
  • 5
    @netikras I do. You still have to make informed decisions based on the data you gather and what could possibly improve. What I tried last time was my only "failed attempt" out of now 14 optimizations.
  • 6
    Reminds me of a famous story back in the old days where they used a profiler and even added new CPU instructions for the most heavily run code part. Afterwards - no speedup.

    They had optimised the idle loop of the OS.
  • 2
    @Fast-Nop yup. I finally hit the performance ceiling for one of the larger chunks of my codes execution time. So I move onto another chunk.
  • 0
    @pk76 Don't just make new implementations to see whether that spot can squeeze some more performance or not :) That's a waste of time.

    Yes, you have to make informed decisions. You have to know what tools you are using and what alternatives you have. You have to see where you might have taken a wrong approach and where the logic can be altered to perform faster.

    profiler, threads monitor, fastthread.io -- these are the ABC tools we use as perf engineers :)

    btw, which language?
  • 1
    @netikras you just don't get it no matter how many times it's said, do you? I do those very things you're saying. There's one out of fourteen cases where after gathering insight and making an informed decision it was wrong. The other thirteen out of fourteen changes made gains.

    I don't know how I can make this any clearer.
  • 0
    @pk76 I do :)

    Okay, I'll just back off reaaalllyyy sloowww, no sudden movements, okay?
    :)
  • 1
    @netikras then if you do understand, you constantly bringing this is up harassing me. You've done this on two separate posts now.
  • 0
    @pk76 yeah, because I got the notifications flowing in from all the threads.. Sorry if I offended you somehow. Or made you feel harassed. Or attacked.

    kiss and make up?
  • 1
    If you haven't seen this talk already, you really should (even if you don't work with c++): https://youtu.be/r-TLSBdHe1A
    Watch the whole thing, then get ready to pick up your jaw from the floor.
  • 1
    @endor I actually have, but I'm going to rewatch it anyways because it's excellent. I'm not much of a C++ programmer but I watch tons of C++ conference presentations because when it comes to performance, these guys cover it better and more than anyone else.
  • 1
    @pk76 I don't know what language you're working with, but perhaps there's a way for you to try something similar to what coz does, since ultimately all code ends up in the same cpu.
    Making a single measurement of performance in a single environment (with all the vairable factors it comes with - including execution path and environment variables) is not enough to establish how much you *actually* sped up the execution. You need more measurements before you can say that your fix made the code run faster/slower/the same.
    Maybe there's a way to scramble things around in other languages too?
    (Though I realize that building one such tool from scratch is not a realistic expectation for one single guy who's working on something else already - but hey, food for thought!)
  • 1
    @endor oh I'm not doing single machine measurements. I got five here, including two on different architectures. And when I make a claim about "X increase" I'm saying the average of those. If there's a massive variance I'm hesitant about even calling it an improvement. Two of these machines are "in regular use" and the other three are clean machines that only ever run benchmarks/tests and get reset after each time. It's not ideal, but it's hardly one single machine.

    This project is in C#.
  • 1
    @endor oh I should also add, the benchmarks are also taken on .NET Framework, .NET Core, and .NET Core AoT. I plan on adding Mono as well, but that requires much more set up. And are taken on three Windows machines and two Linux (no .NET Framework results there obviously).

    It's not the extent of Stabilizer, which I would love to be using. But it's certain some more randomization instead of just being a quirk of something else.
  • 1
    @pk76 props to you for being that thorough!
  • 1
    @endor Thanks man. I do what I can.
  • 0
    @endor so after looking into this, it seems like every .NET runtime does memory reorganization during GC phases, which should (in an ideal world) mean memory layout issues aren't going to happen, although AoT would still be subject to this and I'm not noting a huge variance (and in fact the kurtosis is nearly identical to the .NET Core results).

    However other things coz/stabilize checks for are still potential pitfalls. I'll look into ways to check branch prediction misses and other things.

    As a somewhat unrelated note I did find that JIT's in general are really bad at unrolling loops in a way that cooperates with the superscalar pipelines in basically every CPU now. So I can potentially get a boost in a major hotpath by manually unrolling to those sizes.
Add Comment