5
cprn
43d

Dumb mistake from when I was still working:

My work laptop’s SSD went haywire, and I/O would spike every 10 minutes or so for ~50 ms. The hardware guy said he could replace the SSD right away, or I could endure it for a few weeks and get a new laptop instead. Obviously, I agreed to wait. The stutter noticeably affected screen rendering, but I didn’t notice any other issues. Little did I know that every time it happened, all input was ignored (as in: not queued). Normally it wouldn’t matter, because hitting a random ~50 ms window is hard. How-the-f×ck-ever…

A few days later — without getting into “why” — I was forced to apply a patch in production. So I opened an SSH session to prod in one terminal, spun up a dev environment in another, copied the database schema from prod to dev, and made sure to test everything. No issues, so I jumped to prod, applied the patch, restarted services, jumped back to dev, and cleaned up the now-unnecessary database. Only to discover that my “jumped back to dev” keystroke didn’t register.

Comments
  • 4
    When i was still working (what a time to be alive!) I had special bg color for production. Worked nice
  • 2
    never clean up anything before you're 100% certain you don't need it anymore, ever.
  • 1
    In most cases, it should be possible to write the script so that if some muppet runs it on prod it will be ok.

    E.g. "where db_name()='name-of-dev-db'" etc.
  • 3
    @retoor Oh, I had them set up as well… but I type faster than I notice changes on the screen.

    And for everyone who *doesn't* set up different terminal themes for different environments — do it. It's too easy not to:

    alias prod='tput setaf 128; ssh prod-host; tput reset'
  • 2
    @donkulator Yeah... What about a scenario where you *want* databases to have the same name (they're on different hosts anyway).
  • 3
    tosensei is correct, never delete, hope there was a backup or something to retrieve?
  • 2
    @tosensei @bazmd It doesn't apply. I was 100% certain I didn't need the test db any more. The issue is I called the command on a prod server by accident. And because I was patching stuff, I used a user with write access. I could, I guess, have a user that had access to things like `UPDATE`, but not `TRUNCATE`, etc… but it wasn't my job to manage users and I worked with what I was given.

    Yeah, there was a delayed replication set up. I just wasted time.
  • 2
    @cprn thanks, I forgot how to do it and the way I use now is not udobno
  • 2
    @cprn you haven't completely finalised all work on prod (which includes "closing all connections"). aka: you weren't done yet.
  • 2
    If the command was:

    if @@servername='dev-sql-1234' drop database FooDB

    Then you can run it on the wrong box and nothing will happen.
  • 1
    @tosensei Oh, I fully agree I wasn't done — I still had to check a number of boxes on prod. But the initial statement regarded the necessity — I seriously didn't need the test db any more, and it really *was* safe to remove. The issue was rooted in removing the *wrong* db. So I guess… “check twice”, or something like that.
  • 1
    @donkulator I don't disagree, but then there are probably a number of other variables I should've checked as well… At which point, it'd just make more sense not to meddle on production at all. But that fight I lost.
  • 0
    @jestdotty it's only hoarding if you actually never finish any project.
  • 0
    How recent was your backup? Did you manage to patch the delta data?

    Must be error prone, tedious, cumbersome, and frustrating
  • 1
    @asgs Replicated database was 20 minutes behind, I paused the system, switched databases, removed bad query from binlog, forced a catch-up, unpaused. Downtime was way below 10 minutes, including checks (and that time I checked twice).
Add Comment