If they followed my suggestion and went straight to debugging the server issues they would have been solved it from week 1 and everyone would have thought the migration had a minor performance hiccup. In fact, we have already done such at least twice before and nobody batted an eye.

Instead they self-labelled the migration a failure on first error, setting the stage for apologizing to the client, and put themselves on the spot for a whole staging / production signoff, replication / backup worfklow, almost a blue-green "seamless" deployment reminiscent of DigitalOcean.

Well they're not DigitalOcean, and anyone who has spent any time understanding users knows they will not participate in "new system" tests long enough to find or report issues.

So of course the migration stretched out to almost three months up until the whole reason for the migration - the rapidly escalating risk of the old provider disappearing - hit like a freight train and now they have to go through the problem of debugging the server like I told them to on week 1. Only this time they've set the client mindset against it, lost any chance of reverting, have had grave risk for data loss, and are under pressure to debug other people's code in real-time.

This is why I don't trust devs to do ops. A dev's first solution to any problem is to throw tech at it.

Add Comment