Do all the things like ++ or -- rants, post your own rants, comment on others' rants and build your customized dev avatarSign Up
I've done embedded development as well as developing monitoring systems for high voltage systems, so I have some experience with this.
I'm afraid there's really no guide for this type of stuff that I'm aware of. It's really about building redundancies into your code testing those redundancies and then thinking further about other failure conditions that could occur.
You can even go as far as having an entirely separate backup system monitoring your main system and taking over if something goes wrong with the main system - like it freezes up. Be sure to always extensively test any safety systems you put into place.
Your post reminded me of this Stackoverflow answer https://stackoverflow.com/a/...
The question was similar to what you asked, how to prevent errors in extreme environment and recover from it. I'm no embedded dev but it might be useful to someone with your experience.
It was the first time I realized that software can run in that kind of extreme environment and they'll get affected somehow.
In general, your company will have broken down the DO-178 into checklists that have to be fulfiled because relying on good coding practice is not enough. You must understand how the A-tables in the DO are broken down into your company's processes and procedures. There should also be some company coding standard, make sure you adhere to it.
If the system is life critical, it will be at least DAL B, if not A. On that level, redundancy can't be achieved in code, this has to be done on system level. You need at least two independent control units with independent power paths, possibly even three of them, and some voting logic.
All input must be sanitised, no matter whether analogue (filtering), logical (debouncing) or digital. Be especially cautious if you receive floating point data - checking the range is not enough because the IEEE format has also special things like NaN that you need to take care of. Never compare a float for equality, and don't use floats as loop variables.
Don't write "clever" code, keep it obvious. Since the DO-178 requires that no code must exist that isn't backed up up by a requirement, you need to put some kind of requirement tracking into the code, maybe by comments in some special format. Also bugfixes will have some kind of problem report ID, and that needs to be in the code, too.
Then for the tests, code coverage will be required. So you probably have some tool that can measure it during the verification phase. For DAL A and B, simple statement coverage will not be enough, also ranges and stuff come in.
Be sure to do already engineering tests properly: don't just test whether it works. Try to break the system. When there is any kind of threshold, be it in terms of analogue values or time, beyond which something has to happen: don't just test whether it happens afterwards. Also test that it doesn't happen right before.
For aggregate conditions, test each one of them individually, not only all together. Say for an AND, test that the output is TRUE if you put all conditions to TRUE. Then put each of them to false individually, check that you see a TRUE-FALSE transition on the output, set all of them to TRUE again, and repeat that for the next condition. Similarly for OR'ed conditions, just the other way around.
I also recommend static source code checking tools. Crank up your compiler warnings to maximum level and treat every warning as error. Maybe your company uses MISRA (although that comes from automotive). CppCheck is a very basic, but free and easy tool that I highly recommend. There are also commercial tools like CodeSonar and Coverity, and your company should have one of these high-end analysis tools.