Solved a SEGFAULT in a conventionally undebuggable 800k lines C++ project after 10 hours of investigating 🙌

  • 6
    You thrive in hell
  • 2
  • 1
    What does "conventionally undebuggable" mean? Did the core dump have no stack trace?
  • 1
    @Yamakuzure The curious thing was: all tests ran in debug mode, but in release mode I got a segfault locally as well as on the CI server... Since the program is multithreaded, the stack trace was a different one once in a while. In the end the program spawned so many threads = undefined behavior in this case -.-
  • 1
    @Yamakuzure I will dive in deeper tomorrow, I just comitted, pushed and went home after fixing it ^^
  • 2
    @Snob As you are seemingly on Linux, I would like to recommend Helgrind and DND from the Valgrind suite whole heartily! They are both incredibly useful in detecting threading problems like race conditions.

    Also have a look at the ThreadSanitizer, available in both clang and gcc.
  • 2
    Sanitizers are your friend ad well :)

    Asan, Ubsan and Tsan :) for both gcc and clang!
  • 1
    @Yamakuzure Oh thanks! I will have a look :)
  • 5
    Segfaults are the worst
  • 1
    @dfox They are!! Every freaking language tells you where the error occurred or what exception happened where... And then there's C/C++ which gives you a core dump and leaves you alone with it. Ufff
  • 2
    @Snob that has nothing to do with C/C++. Other compiled languages can make the kernel terminate a program, too, if they try to write into memory they aren't allowed to write into.
    At least a core dump can show you exactly what happened when, and with "gdb -tui" you'll be even presented with the source code where every step of the stack trace is.
    Try that on Windows after a "Protection Fault". 😊
  • 1
    @Yamakuzure That's true, but I'm used to Rust, Python and Java where you are presented a detailed error log..
  • 0
    @Snob Sometimes I have to deal with Python tools in the elogind build system, and when they break, the "detailed description" is cryptic as fuck!

    Don't get me wrong, I appreciate how Java, Python and others present you with a trace at once.
    But being able to *inspect* every value through a core dump is by far superior and a much bigger help. (debug build needed, of course.
    -ggdb3 ftw! 😁)

    As for the Python errors, they just tell me enough so I get an idea about where to start the manual single-stepping.
  • 1
    @Yamakuzure Then I must have a better experience with Python, I never had to investigate longer than 30 minutes on a single bug. Python 3 at least ever told me enough to tidy up and fix the bug. But since I'm more of a compiled languages guy, I personally love Rust. Bounds-checking at compile time, no dangling pointers, no memory leaks, no "Null", ... Hard to learn but very easy to debug :)
Add Comment