11
SteffTek
305d

Today was a SHIT day!

Working as ops for my customer, we are maintaining several tools in different environments. Today was the day my fucking Kubernetes Cluster made me rage quit, AGAIN!

We have a MongoDB running on Kubernetes with daily backups, the main node crashed due a full PVC on the cluster.

Full PVC => Pod doesn't start
Pod doesn't start => You can't get the live data
No live data? => Need Backup
Backup is in S3 => No Credentials
Got Backup from coworker
Restore Backup? => No connection to new MongoDB

3 FUCKING HOURS WASTED FOR NOTHING

Got it working at the end... Now we need to make an incident in the incident management software. Tbh that's the worst part.

And the team responsible for the cluster said monitoring wont be supported because it's unnecessary....

Comments
  • 3
    "monitoring won't be supported because it is unnecessary"

    Wtf who uttered those words!?! That person/team/dept should be fired on the spot for this.
  • 2
    @NeatNerdPrime @jestdotty The old cluster team didn't setup monitoring and now the cluster is so unstable (with 200 CPUs and 1.5TB Ram USAGE) that we can't even deploy longhorn CSI to get away from the deprecated GlusterFS.... This shit is fucked badly
  • 0
    Later that day I had the same migration for a redis database, but that was planned at least...
Add Comment