Your biggest dev insecurity?

Ranter

bkwilliams

7188

Comments

6

SortOfTested

19558

5y

I mean, if I fuck up the power grid goes offline and a portion of 3M customers worth of people die due to exposure. But I'm definitely not a doctor. The importance of the system is not my importance.
0

bkwilliams

7188

5y

@SortOfTested programming for medical or military; utilities can be on that list. But do you track the deaths caused by programming issues due to outages?
4

SortOfTested

19558

5y

@bkwilliams
Yes. They're reported as complaints to the PUC and we pay fines and compensate the families for any grid failure. Cases have to be fully documented for each issue and the source problem identified.
1

bkwilliams

7188

5y

@SortOfTested microcontroller code issues for Toyota killed a lot of people, https://safetyresearch.net/blog/... but the list was not meant to be inclusive.
0

SortOfTested

19558

5y

@bkwilliams
Maybe they should also be a regulated utility governed by legislative tarriff.
1

bkwilliams

7188

5y

@SortOfTested I learned something today. Thanks!
2

sladuled

7014

5y

@SortOfTested Don't mean to be rude/stupid here, but aren't system that can make people unalive supposed to have like a gazzilion more safetey checks in place?! No single point of failure etc.. ?
1

SortOfTested

19558

5y

@sladuled
Who said they don't?
1

sladuled

7014

5y

@SortOfTested Well you wrote that if you fuck up, it can result in deaths..

Which makes me wonder what exactly do you do, and what are the protocols there.

Because I see two options:
a) we/they don't have any other safety measures in place
b) we/they have safety checks/measures but they also failed

There might be option c or d that I do not see and this is why I'm asking for clarification.

Because if it's option a, than this is really stupid (also doesn't go well with your response who said they don't) and if it's option b, then it's not really your fault (well, not solely and directly your fault), a lot of things needed to go wrong to get catastrophic results..

Don't mean to annoy you, I'm just trying to understand all this, because in my head there're only two options
1) if you work 'can kill' - there should be other safety checks to handle your fuckups - no problem unless really really unforseen stuff happens
2) if your work cannot kill - no problem if you fuck up anyways
5

SortOfTested

19558

5y

@sladuled
The controls are there to prevent it, doesn't mean you can slack on any of them. If you're the source of truth, that means you can lie to downstream systems.

Imagine someone doesn't pay their bill, so their meter gets shut off. If they're a protected class (olds, etc) and it's winter or summer, emergency reactivation is mandatory so their ac or heaters don't cut off. (Old people die in extreme heat or cold faster than young people, see Chicago, Texas)

Now imagine that someone pushes a patch to the system that causes the system to deprioritize that message stream, and it ends up going into the 2 weeks queue instead of the 20 minute queue. Doesn't necessarily cause an alert because it doesn't cause a failure, but it's still critical.

Now imagine you get 180M data updates an hour just from meters, and you have to gate all of that. We have thousands of process controls, as well as software and hardware redundancy, but we still have to deal with Byzantine fault tolerance. If you have a system that makes that trivial, you might want to go collect your Nobel prize.

Or to put it in a way that got the former director of communications fired, "we can build an electrical grid that never fails, but no one would be able to afford it."
1

Maer

1642

5y

@SortOfTested By all means, continue sharing your experience. This is incredibly interesting.

I work in the medical sector, where our software assists in patient treatment. Obviously this requires a lot of safety measures, however all results provided by the software are then verified multiple times by medical personnel.

However we have nowhere near the scaling of what you are describing which is why this is so interesting to read.
3

devphobe

8965

5y

I quit a job in the medical sector when the product owner demanded we deploy a feature that the senior dev said wasn't ready. Doctors weren't paged for critical medical issues that night. Patients wondering if they were having a heart attack weren't called back that night. Worst day of my career.
1

xcodesucks

598

5y

In assembly in the typical missile code, the good news is that > and < do not not exist. It is either performing subtraction and testing for greater than zero or a completely separate branch instruction like BGE (branch greater than) and BLE (branch less than). So, screwing up a ">" and a "<" in real time firmware where it could kill someone is much harder than in some bloated raspberry PI unix system. If they "upgraded" the navy missile firmware to linux bloat-ware, we are all fucked!
0

sladuled

7014

5y

@SortOfTested That was extensive explanation, thank you very much!
Shed a light on an another aspect I didn't thought of because I was focused on bigger takers, like hospitals, homes for elderly, research labs.. that should have some kind of backup generators at least for the most vital medical equipement.

Makes sense that 'normal' people at home don't have backup generatiors. I just didn't think about freezing to death due to power failure or being wrongly cut off, because here you either have heating based on oil/wood or you're coupled to central heating system. We also rarely get life threatening low temps here and I think it takes 3 or sth bills to be missed to actually cut off power supply to anyone (will not test it though xD).

And also no no no... I never wanted to imply that achieving lowest possible failure points system is trivial..or that people can slack off because there are QA behind them.

It's just if something is really really important, there should be no solo decision making/coding/testing etc.. so in theory one person cannot be held responsible if all those things fail (due to fucked up timing and low probability of failing at same time) and they did what was required of them. At least how I see it.

Personally I'd never dream on thinking about not checking whatever I can before handling over the code, even if it's only responsible for setting the sournes of the lemon icecream..

At the end of the day, I'm happy that my fuckups (+possible failures in safety checks) can only cause huge monetary loses. Well at least I hope people don't get executed for things like this.. o.O

& thanks again for taking the time and replying! //and hopefully not being annoyed by my questions..
0

bkwilliams

7188

5y

@xcodesucks that was one of the things I liked about SAP, it had GT, LT, GTE, LTE, EQ.
0

Maer

1642

5y

@devphobe That does sound horrible. At my workplace such a PO would not have lasted the night.

In our line of work life threatening scenarios are highly unlikely, but even so, having experienced a few companies I am now with one where the team is amazing and managment actually values our opinion and decisions.

Yeah, those actually do exist and I am greatful to be working there.
1

Gregozor2121

5083

5y

If life relies on your system then it shouldnt have a single point of failure... One person shouldnt be able to fuck up the whole system.
0

Wisecrack

9363

5y

@SortOfTested what do you use for that much data? Something like hadoop?
2

bkwilliams

7188

5y

@Gregozor2121 there are processes out there that are not only single point of failure but they are some of the worse hack a half you can imagine. It depends on the domain and the lawyers.
0

SortOfTested

19558

5y

@Wisecrack
It's older than that, for better or worse. Messages are binary EDI. They're cached in what amounts to an event store, and near real-time consumed into in-memory actors. The events are aggregated and snapshotted daily.
1

NoMad

13490

5y

So the OP is happy about not having the responsibility of someone dying...

Tbh that's one reason why I chose to become a dev anyways. Not because it never had a life/death situation, but because it was easier to avoid. A doctor can't simulate and test easily, but a dev can.

And I also hate that feeling of saying the absolut wrong thing. It's horrible. Specially when you don't know if you've apologized enough or right. Like, it's very tricky. Not sure if it is time related tho. I'm generally anxious about making people feel bad, but I don't care about making them feel good.

Related Rants

Add Comment

rant

wk207

soft skills