12

How many of you use the right data structures for the right situations?

As seasoned programmer and mentor Simon Allardice said: "I've met all sorts of programmers, but where the self-taught programmers fell short was knowing when to use the right data structure for the right situation. There are Arrays, ArrayLists, Sets, HashSets, singly linked Lists, doubly linked Lists, Stacks, Queues, Red-Black trees, Binary trees,.. and what the novice programmer does wrong is only use ArrayList for everything".

Most uni students don't have this problem though, for Data Structures is freshman year material. It's dry, complicated and a difficult to pass course, but it's crucial as a toolset for the programmer.

What's important is knowing what data structures are good in what situations and knowing their strengths and weaknesses. If you use an ArrayList to traverse and work with millions of records, it will be ten-fold as inefficient as using a Set. And so on, and so on.

Comments
  • 9
    No. That is very wrong approach to this complicated subject.
    What CS teach is what they know "how" to teach. And they are doing a pretty poor job while at it.

    I have been doing practical prgramming for more then a decade, and this is my take: make it work first, then optimize if required. Thats it.

    Premature optimization is the root of all evil. It is very probable that the list you are using right now is "good enough", and will stay "good enough" forever.

    What CS don't teach is the process: how to breakdown real problems into smaller ones. How to solve those little problems. How to use an IDE. How to use a debugger ffs. How to benchmark your code. How to find the highest time consumer, and only then, optimize it.

    No - they know thier b+- trees. And looooovvvvvve dijkestra graph traversal calculations. Really love them.

    tl;dr No. know how to deliver, not how to choose the "right" data struct.
  • 7
    Your data structures class won't teach you language specific data structures. You'll learn the difference between a linked list and an array and a hash map and a priority queue, but you aren't going to learn how to use a particular core library. That's all gonna be self-taught, on the job.
  • 6
    Agree. Choosing the right data structures is not only about efficiency, it's also about readibility, and therefore about avoiding bugs.

    Linked lists however have been problematic for years already. They trash CPU caches, and the lookup dominates so much that the insertion/deletion speed becomes irrelevant.
  • 9
    Also, it's not premature optimisation - quite the opposite, choosing the right structures and algos is the big picture optimisation that needs to be done at design phase. Anything else is just hacking shit together quick and dirty. Premature optimisation refers to micro-optimisations while not even having a proper design.

    An example what I mean, the list match from the previous rant. Looping through both lists in a twofold nested loop isn't getting stuff done, it's creating a time bomb that scales horribly.

    Using e.g. C's qsort and bsearch for that is the big picture optimisation as design. Premature optimisation would now be to pull out qsort and bsearch from the library to avoid the comparison function call - unless profiling has proven this to be a bottleneck.

    (For non-C-programmers: the library qsort/bsearch take a function pointer as parameter to do all the comparisons and cannot inline these calls.)
  • 7
    @magicMirror My senior colleague doesn't agree with you. You should make your code good enough for every situation, but you should write efficient and properly factored code from the start, not write a mess and then clean it up after. You shouldn't use an ArrayList or a wrong data structure if your company handles billions of records per day, having large, complex queries run on them (for example for Business Intelligence, statistics, forensics, etc.). I'm in the Science sector and it's expected of me to know every optimization possible. I don't work for the average mid-size company where it doesn't matter. The reason why I'm breaking my head every day after work over the best optimized code is because it's a required skill. I have to care about things literally to the nanosecond.
  • 5
    @magicMirror Also, it seems to me from what you're saying that you didn't attend university, for teaching process is also taught via the courses Project Management in freshman year and later the courses Business Processes and Logic theory (work breakdown structure, graphing, etc.). I don't like it when people bash education without having a proper one themselves, it's not right. I've learned a ton, if not 95% of my skills, from university.

    At uni they also teach you how to use an IDE, in fact, tons of IDEs, in freshman year. Not only use, but research it entirely.

    It really appalls me where you get this misinformation from. There are no colleges or universities that don't teach you how to use a debugger. They even ask you to write one, which is also part of a course there regarding debuggers, compilers and embedded programming. Benchmarking is also taught in freshman year and the second year as well, but in greater depth.
  • 3
    Education is of great quality (at least, if you attended the likes of Yale, MIT, Stanford, Harvard,.. I didn't go to those but to one of equal standing and I'm very content of the quality of their education).
    Education shouldn't be bashed baselessly.
  • 3
    @rjedlin Actually, we learned to use those in various languages before we graduated. Universities these days prepare you for the real world, it's not like it's used to be. Education has become a goldmine of quality.
  • 2
    @Fast-Nop I went to a school on that list, and we sure as heck didn't bother with the minutae of language specific implementations of a particular data structure unless it was pertinent to the project we were working on, and we may have spent all of one day on, "How to use an IDE." University tends to focus on abstract concepts and theory in the classroom, with the assumption that students are capable of thinking for themselves and reading documentation when it comes to specifics.

    In other words, I think you may be underselling university if you're getting bogged down with stuff like collecting IDEs 😉
  • 1
    @rjedlin Uhm? Did you intend replying to @CaptainRant?
  • 1
    Like I said, universities have become more modern - and if someone was paying attention: IDE's are freshman year material, meant to introduce you to the working world. We also spent 4 days on 4 different ones. People are quick to judge.
  • 1
  • 3
    @CaptainRant
    You are wrong. on many levels.
    1. CS in most places do not teach those courses. Or how to debug, or how to use an IDE.
    2. I have an under-grad education, but decided not to complete my graduate thesis for exactly the reasons I mentioned.
    3. Are you aware that most Edu backed projects are total shit? look at the Weka java ML kit. Or ImageJ.

    CS problem is wanting to write perfect code each and every time. Not working code - perfect code. That is why you have a headache. I feel for you.

    It is much better to write a multiprocessing golang based tool, that can handles those billions of records using the golang version of ArrayList (list in case you were wondering) - instead of trying to choose the correct red/black tree varient in Java using queues, and threadPools. Or python. gl with python.

    tl;dr there is a reason why it is called software engineneering.
  • 3
    @magicMirror Ok, take your tl;dr elsewhere. I don't have time for people like you.
  • 3
    Classic example of someone working in a scientific field thinking optimization is crucial to their Work vs. someone working a regular job who thinks optimization isn't crucial for their work. Both are right in their context.
    I have a colleague who Made the switch from C++ godlike optimizations for imaging Software to Web dev. At first He wasn't satisfied with how we approached development and realized after a while that it's Just a Very different context. There are simply way less opportunities dir optimizations for regular code in web dev than there are when working with millions of images in a language that allows more control Like C++. That's why webdevs say "don't optimize too early when your goal is to ship the product". When your goal is to optimize the Code then optimize the Code ffs because that's a different goal!
  • 2
    Btw., the most absurd data structure hack I've used so far: what I wanted was something like a sorted read-only map, but with variable length data records. The whole thing ended up with a raw byte array as container because the target computer didn't even have a filesystem.

    The records weren't actually declared structures because there was no alignment, just raw bytes assigned by convention.

    Each record had the lookup key, followed by the length, then the data. On top of that, another array with start entries of the records calculated from the lookup key, similar to a skip list, to speed up the lookup. Just right-shifting the lookup key and using that as index for the second array.

    Also, the lookup key was implemented by abusing CRC-32 as hash function, and I managed to fiddle in an additional CRC-8 distributed over some spare bits of the records.
  • 4
    @don-rager It's also why websites are bloated pieces of shit with thousands of superfluous DOM elements which make DOM traversal slow, presentational CSS classes leaking into the markup making it non-semantic, JS doing HTML's and CSS' job, dozens of superfluous files which run into handshake delays especially on mobile networks, and all of that just hacked together any shitty way.
  • 1
    @Fast-Nop yeah most websites are bloated pieces of shit but the rest of what you said doesn't really make sense. But the people working frontend (from my experience) usually lack any sense of wanting to keep things short and clean.
    And also: webdev is more than Just websites. I didn't even think of websites when writing my previous comment. And don't get me wrong please: Avoiding premature optimization isn't a free Pass for shitty quality!
  • 0
    I'm always learning about new data structures, so I can try and choose the most suitable for any particular task.
  • 1
    @CaptainRant

    > Universities these days prepare you for

    > the real world, it's not like it's used to be.

    Glad to hear that is happening where you are.

    Here, its still like it used to be.. :-(
  • 0
    I'm reminded that in assembler with lots of RAM, I try to write code that doesn't loop for speed, if I want to loop something 10 times, I write it out 10 times in a row. :-)

    As such, often my code will look very simple and child like, but run super fast. :-)

    "Why do you have 32,000 instructions the same.. ?"

    Because its faster than a loop. :-)
  • 3
    @Nanos That's only true as long as the larger code doesn't run into CPU cache limitations.

    It's also subject to law of diminishing returns. If you unroll a loop 8-fold and take care of remaining leftovers, you have already eliminated most of the loop overhead.

    Plus that it's only meaningful for loops with short body because otherwise, the loop overhead doesn't have impact anyway.

    It also makes the code harder to maintain and looks like premature optimisation - unless you had profiled that stuff before and after.
  • 3
    @magicMirror
    > What CS don't teach is the process: how to breakdown real problems into smaller ones.

    Quite the contrary, actually.

    People shitting on education needs to stop. Like what you learn in Computer Science were ivory tower ideas that don't even apply in the real world.

    I quote: "We build systems like the Wright brothers built airplanes — build the whole thing, push it off the cliff, let it crash, and start over again."
    Our profession does have the equivalent of wind tunnel testing and, to some degree already, aerodynamics. And yet there are many devs patting themselves on the back for dismissing those. This needs to become unacceptable in our profession, too. The sooner, the better.
  • 1
    @Fast-Nop Hey, one of my costudents did something like that in their second year and he needed three professors troubleshooting the problem for 72 hours straight. :D Fun times.
  • 1
    @Fast-Nop Tell me about it, websites used to quickly load in about 1 to 2 seconds, but now if you of course inspect what is loaded on load time.. what the ffffffffffffuuuuuuuuuuuuuck. Not mentioning IndexedDb and local storage, hundreds of libraries are loaded fully (!) when loading the main page. Then of course, some websites are so poorly optimized that I have to insert a script to exclude specific scripts from executing or listening. Blergh...
  • 0
    @VaderNT You have a point. But not much of one.
    CS do not use a practical approach to problems - they use the science approach. Which means concentrating on the Novel, and publishable. Belive me - I'm a third or fourth author on a couple of ML themed papers.

    Using "Wind tunnel" approach for software dev will get you a tank. Not a car.

    Don't get me wrong - optimization is important. But not as important as delivery on time, or usefullness. If you have the time/budget - sure, optimize away. But in most cases, you just want to go to buy milk from the shop.
  • 2
    @magicMirror Yeah I once worked at a company where a software supplier had done a similar approach. His stuff worked nicely in the demo with a dozen real time data points. But in reality with many thousands, even loading took an hour.

    My conclusion: there was an n^2 algorithm somewhere that didn't show up in demo testing. And no, it's not much more work to do it properly. What it takes is awareness of time and space complexity. It's not even super optimising, just making sure that it won't scale horrible would have been enough. But nooo, demo runs, ship it already, yeah sure.
  • 3
    @Fast-Nop Are you familiar with the WinXp "windows update" problem?

    https://arstechnica.com/information...
  • 3
    @magicMirror There are many industries where just buying milk from the shop would end up in a catastrophe - defence, automotive, aerial, medical...
    Even in less time critical industries, like games - nobody would play a 15 FPS game.
    In many products, performance is a requirement.
    Optimization can start with data structures and end up in hardware specific hacks. I learned this in uni, and so far have not seen self-taught developers who taught themselves cache optimizations. In general, you don't really see self-taught people in positions related to hardware. So maybe uni is not useless after all.
  • 2
    @NickyBones Oh, ffs.
    I never said optimizations are not required, or that uni education is useless.
    What I did say is that CS approach "everything must be optimized" is not applicable in real life situations.

    And do not talk to me about the automotive industry - where it is ok to cut corners, as long as the lawyers approve, or the medical industry, where it is ok to disable a 250k$ life saving CT machine becuase you cannot upgrade the computer hardware attached to it bc the only driver is for winxp, and the mobo just died. Those industries use the same approach you advocate against.

    This is the real world we are living in - and optimizations have costs. I am also against this approach, but I'm a realist. And I had to learn to optimize myself - bc not one of CS professors bothered to teach me, or my fellow students.
  • 0
    @magicMirror

    Does that also effect Vista updates too ?

    Since I find now I can't really install Vista anymore, since it never seems to update, it just sits there like a lemon waiting forever..

    I used to be able to do it, but it was a complete pain to install and jiggle life into the updates.

    But now, tried it a dozen times, and each time, failure.

    Resorted to Windows 7/8.1 instead..
  • 0
    @magicMirror

    XP is on my list of things to install again for dual boot purposes, since a few things only have XP drivers.
  • 0
    @magicMirror
    You missed what I was trying to say. I wasn't speaking about optimization at all, but in general about our profession.

    To stay with the aerospace analogy: I'm ready to fly passengers over the atlantic. Let's build that plane. And I have a hunch there's something better than propeller-driven airliners. Last time I spoke with someone in research, they were mathematically modeling airflow and talking about this "turbine" thing.

    And yet my coworkers hand me plywood and a handsaw to cut it into roughly rectangular boards. "Does this generate enough lift?" I ask. "Have you tested in the wind tunnel that its wings aren't torn off?" I ask. They just laugh. Because "aerospace ideas don't apply in the real world" and "I don't believe in wind tunnel testing". So they push the thing they built off the cliff, believing this time for sure it'll fly more than a couple hundred meters. "Crossing the atlantic? Haha in your wild aerospace dreams maybe".
  • 2
    @magicMirror
    > I never said (...) uni education is useless.

    I must say, I understood you that way, too. Thanks for clearing that up. 🙂

    > What I did say is that CS approach "everything must be optimized" is not applicable in real life situations.

    At university, I was taught "premature optimization is the root of all evil". I don't agree "the CS approach" is what you claim.
  • 0
    This thread is long and getting a bit out of hand. :P
  • 0
    Not to mention everyone is just talking past each other
  • 2
    @VaderNT hopefully, the gif will be attached now.
Add Comment