128

Me: Optimize a sort & match method in backend because users complain it's a bit slow.

Coworker: These algorithms are both O(n), so they're identical *closes PR*

Me: *start zoom call* "Heeeeeeeeeey Iiiiiiiiiii wouuuuuuuld liiiiiiiiike toooooo diiiiiisscuuuuus thaaaaaaaat puuuuuuulllll reeeeeequuuueeest yooooouuuuu cloooooossseeeed"

Coworker: "wtf are you doing, why are you talking so slow"

Me: "No matter whether I talk fast or slow, the information still reaches you in O(n) time, so why are you complaining"

I fucking hate it when people misunderstand the purpose of (or abuse) big O notation. It's an estimate of how an algorithm SCALES once the set increases in size, in which case you leave out both less significant terms and constant factors.

But those terms and factors are important when you're talking about the DIRECT PERFORMANCE of the algorithm on fixed-size sets, instead of SCALING to larger sets.

1n and 10n are both O(n), but 10x performance on a job that used to take 10 minutes is still significant.

Comments
  • 34
    This. Also when people think just throwing more cores at a problem helps indefinitely. Amdahl's Law, folks. Performance scaling vs. other metrics is what you should be looking at.
  • 39
    Quite impressive your coworker could see they where both O(n) and still fail to see the reason, that some very selective brilliance ;)
  • 17
    @RememberMe

    Factorio is amazing at demonstrating things like this.

    It's not just the amount of factories you build... it's also about belt speed, how fast your little robot inserter arms turn around to transfer items, how much fits in your chests, how often trains arrive at the local supply station, how big those trains are, how fast those trains drive, how energy/space efficient your whole factory is, etc.
  • 14
    @Voxera Yeah.

    Many developers have this thing where they learn a new concept, and then over-apply it.

    This guy just posts his expert Big-O analysis advice on absolutely everything, regardless of whether it actually makes sense.
  • 6
    @bittersweet this is the exact reason why I love Factorio and played for large amount of hours
  • 3
    And that's even before we get to the asymptotic part of asymptotic notation. It's no good being O(n) instead of O(n log n) if it no longer fits in sodding cache as a result.
  • 5
    Well, on inspection you can guess that it's O(n), but the computer architecture may implement it as O(n^2). Man some people do not fucking understand that big O is a gross surface level abstraction.
  • 4
    @bittersweet Did you reeeeaaaalllyyy talk slow?! If yes, kudos to you.. I'd probably just yell..
  • 3
    using Zoom eww
  • 12
    If someone just said “nope” and closed my PR without consulting me first I think I would have something to say.
  • 0
    I get your point but in practice, big O is all that matters.
    If you have a significant speed difference with the same O complexity, then your constant factor must be really huge. But how can you have such a huge constant factor?
    That would require some intensive nonsense computations that do not depend on the input size, otherwise it wouldn’t be the same complexity.
    I really can not think of an example where a constant factor would matter.
    Unless, of course, we are talking about tiny input sizes.
  • 9
    @Lensflare In big-O, both O(1000n + 1000) and O(n) are both just O(n).

    So if you have 100 items in a simple, singular for loop which takes 10ms, after which the result is reduced to a result in 100ms, the total operation takes 1.1 seconds. If you can reduce the iterations to 2ms and the reduce to 15ms, so the related user page ends up loading in 215ms, that's a significant improvement.

    Of course it still SCALES at a pretty evil O(n) -- if your dataset becomes 2000 items, your page is 20 times as slow, because O(n) represents a linear relationship.

    But you have sets which scale, like "the set of all registered users on my platform" and you have sets which will predictably never scale (much), such as "the set of all zipcode areas in my country".

    So whether the constant factors (O(2n)) and less significant terms (O(n+10)) matter really depends on the case.

    Sometimes, optimizing the performance of each iteration is very useful regardless of whether the scalability improves.
  • 2
    @bittersweet Just to add to this, there will always be a finite limit to the computational problem in practice, and consequently, that something is O(whatever) doesn't in itself tell you anything, because it is asymptotic.

    However, it usually turns out that low asymptotic complexity corresponds to practical efficiency regardless. O() doesn't actually describe something that is honestly useful in and of itself, but it acts a good proxy for something that _is_ useful.

    Two examples. One: the complexity of a key recovery attack on any fixed-key size encryption scheme is O(1), because the key-size is fixed. It doesn't make it trivial though.

    Another: you can create a sorting network for n inputs with size O(n log n), but in practice you will always use an O(n log n log n) algorithm because for any n that will fit in memory, the coefficient in the AKS network absolutely swamps the extra log n factor in something like bitonic sort..
  • 4
    @halfflat Exactly. Performance and scalability are related, but separate concepts.

    Especially if you can't escape the fact that you need some algorithm which scales terribly (like brute forcing a password), it becomes all the more important to make sure the iterations themselves are optimized.
  • 0
    @bittersweet I just don’t see how you would reduce 10ms to 2ms for each iteration in practice. I mean, what useless crap would the code have to be doing so that it could be optimized that extremely? I think it’s not realistic.

    Of course you can think of any theoretical example where it would matter but in practice, it doesn’t.
  • 2
    @Lensflare You'd be surprised what you find in a Laravel codebase.

    Sometimes, you're sorting user profiles, and they have 10 megabyte selfies attaches to them. Sometimes, you're trying to match two collections of objects, one of which calls a postgres database within every iteration through "magic", simply because you're reading a property. Sometimes you think you're handling some collection of strings, but in reality it's a LazyCollection of Closures which resolve to become strings.

    Depends a lot on how opaque and "helpful" (read: apply complex magic for the sake of making things easy rather than simple) the framework is.
  • 3
    @bittersweet @Lensflare

    What @bittersweet said applies to any language... Small nuances can make a huge difference.

    Eg (scroll to perf for win) this gem...

    https://kate-editor.org/post/2021/...

    Most of the time, be it PHP / Python / C / Java / ... or whatever tickles ya fetish the o notation is a partial lie as you only evaluate what you're aware of.

    What really goes on, down in the assemblied code, is hard to account for.

    Eg. string functions. Do you account for the charset? UTF-8 vs ASCII vs UTF-32... Big difference - yet you might not be aware of it, as you look at the algorithm and not at the internal representation of the string and how the string get's treated.

    There can be quite a lot of shenanigans going on under the hood - O notation most of the time only includes what you are aware of.

    Looking at profiling, fuzzing input, tracing, toolchain changes and so on can open a whole other dimension.

    To sum it up: O notation is an indicator - not a summary and not the full evidence.
  • 2
    @bittersweet I love Factorio and Satisfactory :D
  • 1
    @RememberMe It easier to understand than misconception since usually more is more, but also easier to prove by using easy examples like having a baby, it does not matter how many women there is, a baby still takes about 9 months ;)
  • 0
    @RememberMe so true. this thread reminds me of aggregate and window functions used in relational databases and how slow they can be depending upon design/implementation constraints. an example such as I want to see what's in a warehouse's inventory right this minute in an almost overly relational db model. calculating across zones, lots, pallets for customers and products using specific profiles that apply to any level is expensive. recursively checking if a profile rule applies takes time and recursion itself is slow. the point being is that there's a time and place for optimization and it's usually motivated by the visibility of performance degradation to end users
  • 1
    This!
    On my last project I refactored an endpoint from 5n to 2n.
    The guy was saying it was pointless because "5n is great".
    Fucking Linq monkeys!
    In the meantime users were getting timeouts everywhere and they were increasing them to 4mins in some places.
    What a shithole that place was.
  • 0
    @Lensflare

    Think of iterating over a vector vs a linked list.

    Both O(n), but the vector is going to be sooooo much faster because you have so few cache misses compared with the linked list.

    In fact, in practice, the constants involved in a linked list are so much bigger than those for a vector that you basically just want to use a vector all the time, even when the Big O for the linked list is better.
Add Comment