164

"Don't deploy on Friday" is a public admittance that your company either has no CI/CD pipeline, or that all your devs are retarded rhesus monkeys who only wipe their ass if the product manager wrote it as a spec.

If the saying was: "Don't port your whole API to GraphQL on a Friday", or "Don't switch from MySQL to Postgres on a Friday", I would agree.

But you should be able to do simple deploys all the time.

I deployed on Christmas & New Year's eve. I've deployed code while high on LSD, drunk-peeing 2 liters of beer against a tree after a party. I've deployed code from the hospital while my foot was being stitched up. On average, we deploy our main codebase about 194 times a week.

If you can't trust your deploys, maybe instead of posting stupid memes about not deploying on Fridays, you should fix your testing & QA procedures.

Comments
  • 25
    That roast😂
  • 12
    2 liters? That's a quite detailed measurement. How do you know? 🤔
  • 40
    Or... now this may seem like a mind blowing moment.... some devs just enjoy their weekend off.
  • 27
    Goodness, you must be a relatively new tech to not understand where the concept came from originally.

    It's because it's a *bad* thing to be up at 4am or on a weekend having to fix your mistakes, if you're available at all.
  • 29
    Maybe we should change it to "Don't deploy 194 times a week"

    Geez. There are 168 hours in a week. Deploying 194 times means deploying more than once an hour. If shit breaks, how do you track down what broke it? Or when? Or who?
  • 14
    @stonestorm The question is, as evidenced by the last paragraph in the rant: why didn't you find the mistakes before deploying? What's wrong with your dev, test and QA processes?
  • 8
    Yeah seriously I’m on the same train. Hire a dev with some infrastructure experience or vice versa and invest in CI/CD. Deploys should break infrequently and should break before they hit prod.
  • 26
    @JustThat @stonestorm @C0D4

    You can enjoy the weekend, because you know nothing will go wrong.

    And you should deploy more than once per hour if you have a lot of developers.

    You know where things went wrong because first your tests yell at you, then the PR reviewer yells at you, then QA yells at you, then the bugtracker yells at you. Ideally, one of the "safety layers" catch mistakes before the deploy happens, but even if you don't -- making a revert PR should be easy.

    When your company grows, you can even look into stuff like auto-reverting kubernetes clusters which react to deviations in bugtrackers.

    @Wombat I never leave the house without a borosylicate graduated measuring cylinder. I first pee in the cylinder, note the amount of pee in my logbook, then dump the contents against the tree. I'm not some caveman who just pees directly against a tree without updating my fluid balance charts... duh.
  • 14
    @Fast-Nop because the problem deals with infrastructure, not code base? Because the problem is with the tests themselves? Because the problem is in the underlying packages/environment?

    All of these have NOTHING to do with the codebase, yet they still exist. God this is why i got stuck bringing back up a 15 machine cluster on a friday night, because of you people.

    And as for the OP, the idea of CI/CD is to continuously deploy when you have srandardized deployment settups. What happens when you dont? What happens when you have a network outtage? What happens if your data center is getting low power so you dont get enough CPU cycles? What happens if your OPS personnel are stroking their dicks when they're supposed to be alerting you that the CD pipe is broken because you havent deployed in 5 days? Also all of these happen on Friday.

    The whole point of this is to mitigate off-time work. If you cant understand the unpredictability of the world, then have fun drowning.
  • 13
    @arcsector Let's put it this way: everyone has a test environment. It's just that some people also have a separate production environment.

    Also, infrastructure can break at any time, unrelated to code deployment. That's why you have redundancy. Or you don't - if management has decided to save the money and just cross fingers.
  • 3
    @Fast-Nop i get that, but it will make more work for me. If i get a call that my process isnt working, my first look is at my own logs, then at the data, then at the machines, and then the infrastructure. In this case I've been burning time until i get to the infrastructure. Sure i could just ask them first, but normally that's bad form.
  • 6
    @arcsector I see, but that's general troubleshooting and unrelated to deployment. If you have proper dev, test and QA processes, the risk for stuff to break shouldn't be higher just because you deploy.

    If it is, then these processes are broken and need to be fixed. Also, the company is bleeding money because bug fix cost as function of the project phase is an exponential curve. That's one of the points of such processes.
  • 8
    @arcsector

    > Because the problem deals with infrastructure

    This is why ideally, your infrastructure is defined as code. The version of NodeJS you're using can be defined in a dockerfile, which is kept in a git repository. Upgrading NodeJS consists of a PR which bumps the version number. If the servers start smoking, you revert the PR, and cycle the containers.

    If your Ops peeps are stroking their dicks instead of keeping your CI/CD pipeline working, they should be fired. It's their job to make sure you can deploy on Friday.

    And if Google or AWS is having power outages... I'll start filtering my aforementioned pee, because that would be the apocalypse.

    My argument is not that you should be available for work during the weekend (unless you don't mind and are being paid a fat premium)

    My argument is that if you can't deploy safely without shit breaking all the time, the company is doing something wrong.
  • 16
    Plenty of deployments are safe and go perfectly smooth. But sometimes, something you couldn't have possibly foreseen happens and breaks shit. And sometimes it doesn't break immediately. Sometimes a weird "one in a million" edge case happens and it all falls apart 12 hours after a deploy.

    And that's why I try not to deploy on Fridays.
  • 4
    @bittersweet Bold of you to assume that everyone can afford competent QA and devops teams.

    Also, the "Don't deploy on Friday" memes probably just stemmed from the "Praying to the server gods before the weekend" memes.
  • 6
    You talk like someone who s had a total experience of about 23 days in IT. Use this thing called google and figure out why people say “don’t deploy on a Friday”

    You deploy your code base 194 times per week? You have something somewhere that is fucked up lol. I can’t even.
  • 3
    I agree with @bittersweet and others saying deployment should be safe anytime. Offended folks maybe should rethink their infrastructure and try to catch up with modern development.
  • 7
    @Wombat Deploying in a Friday does not necessarily have to do anything with infrastructure. Jesus Christ. Y’all ever worked for real businesses? Most of “Dont deploy on Friday” issues don’t have anything to do with the fucking infrastructure. This site has got its share of idiots.
  • 7
    This only really applies if everyone on your team is on the same page on software that was developed and established by that very same team.
    I have abnormally large code bases made by people that I wouldn't even dare call a developer. I don't know if they were high when they "designed" them or if they were doing it on purpose to fuck with the current developers in place.....but I ain't risking my team's weekend nor mine for something that can wait as a punishment for shit code that they did not write. Sure, for critical implementations fine, we can fix shit over the weekend, I'll bring the pizza and the beer, whatever.

    But the meme originally came exactly for this very same reason, infrastructure or the codebase can fail for shit that is completely beyond you and you still have to be there to "fix it"
  • 7
    @hatemyjob Why is 194 times a week too much? In my old company (new one is getting there) every push was taken to the CI system, unit tested, build a new container, attached it to kubernetes, ran all applicable integration tests, ran critical E2E tests, deployed the container to live, routed roughly 10% through the new deployment, monitored errors per minute and if everything went fine it took down the old one otherwise it blocked the new one for external access and sent out a review note to dev and QA. Without any human involvement. We had times where we deployed multiple times per hour and no critical failures occurred.
  • 4
    Great, we are glad you and your company are so awesome that murphys law never applies to you. You can have a team of best devs in the world and no matter what you do sonething will always go wrong when deploying on friday evening just to fuck up everyones weekend. Let it be a roof leak in a server room but sonething will happen.
  • 3
    That only works for applications which are not reliant on third-party APIs
  • 2
    @c64forthewin what? Why? How can your deployment be dependent on third party APIs?
  • 3
    @provector Now I'm really curious to know how not deploying on a Friday will keep your roof from leaking.
  • 6
    @provector The company where I work is by no means perfect, but they got three things right:

    1. Testing & QA means you are 99% sure your feature won't disrupt service to your end users.

    2. If a feature contains some horrible bug after all, a revert is one click away.

    3. DevOps are generously paid to be on call during weekends if the infrastructure has problems (but that's generally unrelated to deploying). OpsGenie calls them out of bed when a ping or critical test fails.

    We've had one case where I, as a developer, was needed during the weekend — and it wasn't related to a deployment.

    It was caused by the completely dimwitted idea that time can have breaks in continuity called "daylight savings time", combined with a developer who thought it would be a good idea to write a timezone-converting library himself instead of using the standard one for the language.
  • 1
    @Godisalie where could I find more information about this processes and tools? You have a load balancer that is configured somehow to know when and where to route about 10% of traffic, smth like that? What do you use for config and monitoring?
    A pointer to some article or a term that groups these concepts would be perfect.
    I like this idea, sounds damn good.
  • 2
    I like this thread. I've been quietly following it for days. I agree with the post but I can also relate to the people who disagree because I worked in projects with less modern processes before. In my current project, a weekend is a weekend and a holiday is a holiday. Servers are configured to auto-scale and in the case of failure, traffic can be migrated to the previous working one in just two clicks.

    In my previous projects, rolling back is a tedious process - arguments, approvals, re-deployment, testing, etc.
  • 2
    @nnee You're actually asking the wrong person, everything after it went live was our devops peoples baby. I remember it being some kubernetes endpoint slices magic but I do really not know how exactly it was done.
  • 2
    @nnee Harness is a pretty good tool for monitoring, building, testing, deploying & rolling back with kubernetes. Free forever for the basic stuff, very pricey for supported/advanced tiers. Gitlab has a pretty complete set of "Auto DevOps" tools as well, free if you're going with community edition, 99/m for supported hosted/selfhosted.
  • 1
    @Wombat Not the deployment itself, but the functionality of the system. You can't always fully trust the API documentation and things that work on the APIs dev system may not wokr on the API prod system.
  • 0
    @bittersweet "And you should deploy more than once per hour if you have a lot of developers."

    I disagree with this completely.

    All of the automated systems will check to ensure there aren't bugs which can be caught by any of those systems such as syntax or missing object or permissions issues but they can't check logic errors without first having encountered the error before (unless your CI/CD uses AI/ML).

    How would you back out a change which incorrectly writes data to a database or, worse, doesn't record data from a new field and all of the entries for that deployment now have lost data?

    And you can't seriously tell me that the processes to QC/QA _all_ of the code can really keep up with that velocity _and_ with the demands of the business? Everywhere I've been there have been non-dev stakeholders involved in rollout schedules. And they frequently change their minds or make mistakes.
  • 0
    @Wombat You seriously don't know how an app can rely on 3rd party APIs?

    Must be nice not having to rely on anything but internal code
  • 4
    @JustThat

    Of course our systems aren't bug free, there are inconsistencies, things which work in slightly crooked ways, give timeouts, or render wrong.

    But I have not experienced any catastrophic failures in my time here (4 years), and we deploy continuously, 24/7 from offices around the world.

    To make that work, PRs are always reviewed by senior devs. We use unit tests, although we don't strictly enforce TDD, nor do we enforce 100% coverage -- but we will call each other out with "shouldn't this have a test".

    Each team of 7-14 devs has one dedicated QA, who monkey-tests pretty much all branches except for quick copy/bugfixes. They use both manual actions, and scripted interactions across a range of devices.

    To perform QA, each branch is deployed as a collection of docker containers, hosted in an identical env to production. The setup is approved by QA, after which the branch is merged into master and rolled out to production.
  • 2
    @bittersweet I want your company. That’s the large company dream setup from the DevOps perspective.
  • 0
    @Diactoros I'd rather not have to be on call over the weekends because someone else made a mistake and QC didn't catch it.
  • 1
    Bittersweet I agree in general with you. There are companies with shitty practices and those that are doing things just right, but I think you are missing the point of the saying. In my opinion it comes from the collective experience of older engineers with murphys law and the fact that not many people want to risk working over the weekends ;) some dont mind.
  • 5
    @provector I used to be one of those older grumpy engineers who believed in Murphy's law.

    Now I'm convinced that Murphy's law only exists because some people don't want to learn from mistakes. They fix bugs, but not the weaknesses in the procedures.

    I merged a feature with missing translations about two years ago. Management was mildly disgruntled. No disaster, just sloppy, end users got to see token strings on some parts of the site for a while.

    I rolled the feature back, and wrote a test a few days later which checks whether the collection of tokens and the collections of translated strings for each language are equal in structure and size — so it can never happen again.

    Grumpy old engineers hate postmortems, because it hurts their fragile ego. They rather order a mug with a witty quote on it.

    Good engineers fortify the testing suite after each bug, and good management continuously invests in quality control.
  • 0
    now let's figure out when CI/CD concepts started to get common practices in software deployments vs. when that saying was commonly used
  • 0
    @JustThat what do you mean code logic is not testable? Unit tests, integrated tests, functional tests, UI tests, etc.? All these are available and automatable.
  • 0
    @ojrask I never said _code_ logic errors. I meant human logic errors.

    Here's an example:

    Imagine a system that builds a list of accounts that should be updated in one system with data from another.

    This system is complex and has several moving parts, some of which fall under the purview of a different group. It has run fine for several months.

    At some point the other group made a minor change to the values in a field in a database table which was being used as part of the algorithm to detect account updates. This change wasn't in code, it was in data.

    The process continued to run just fine but, it now processes all of the accounts _except_ for the ones it should update.

    No errors were thrown and it was only noticed when the average length of time the process took to run increased dramatically.

    The "logic" change here was not bad code but an obviated assumption based on someone else's change.
  • 0
    @ojrask Perhaps a good test framework would test all of that on every build/deploy. But, given that the process normally takes 90+ minutes to run, having a full test-suite test every possible use case each time one of the 194 changes are made... you get the idea.
  • 0
    Ah that kind of logic, i see what you mean now.

    The more you write integrated tests the longer the test runs take with less benefit. Add in some Robot Framework and you're looking at humongous numbers.
  • 0
    @JustThat 90+ minutes sounds like deploying a whole enterprise.
  • 1
    I trust that my CI/CD pipeline is stable enough. Though regularly, I won't deploy on a Friday after 2pm. And almost never on weekends.

    I like to be awake and motivated to fix infrastructure fuckups. Just because you can "always" deploy doesn't mean you should.
  • 1
    In my case, I am always working with other teams. My team deploys the cloud infrastructure that runs applications developed by other teams. I don't review the other teams' code. I can't guarantee that my infrastructure updates won't break their app. I can't guarantee that their code won't fill a VM's memory with garbage and crash. One application triggered an autoscaling group to scale out from 2 servers to 300 over the course of a single hour due to an error in the scaling policy that would have caused it to keep spinning up new instances if it hadn't hit the account limit. Every prod deployment that should have occurred that morning failed due to the broken app hitting the server limit in a shared account.

    Developers give me bad requirements and discover 5 hours after deployment that they actually need twice as much disk space for a VM, higher memory cap for a Lambda function, six extra servers, whatever. And production is down until that's fixed. We can't predict load well enough.
  • 0
    Thank you for this rant, you are totaly right! I will use this rant as an example for the next person that says "dont deploy on friday"
  • 0
    200 deploys a week? I even cringed bro
  • 3
    I used to live by the "don't deploy on Fridays" rule, but that was 20 years ago when you had to manually copy files, run database scripts, edit config files, etc.

    Now that it is done automatically I have no problem deploying on a Friday.
  • 0
    Well in my company there's no QA and that's why after EVERY release (with some late new features added bc "it just takes a minute to do it") we have at least 3more releases to fix the obvious bugs found in the first release
  • 2
    @ZioCain You work at Microsoft?!
  • 0
    @JustThat LoL no, but I can see the pattern!
    Even Apple does this, btw
  • 2
    @ZioCain Yeah, lots of companies do. Keeps us off the street, I guess.

    Microsoft was just an easy target.
  • 1
    @C0D4 no one should be forced to work on weekends except if you accepted in a contract or you just really like working...

    And I believe the whole point of his post was that even if something fails, it's still not harming anything as the tests would fail
  • 1
    you, you are worth following to the depths of development hell and back again.
Add Comment