First of all, ask yourself this: when you’re dealing with production data, even if you only have a single customer to serve (maybe it’s yourself or your organization), is there ever a time that it’s not mission critical?
It’s always mission critical.
I’ve worked for NUBIC for just over a year now. I’ve botched only a couple of production deploys, and was able to rollback quickly from all but one of them. You can believe that the one that I really messed up haunts me; it’s the kind of mistake that honestly made me question whether I was cut out to be a software developer, and whether or not I’d taken deployments as seriously as I should. Sometimes a reality check like I’ve had is important to humble you, to sharpen you, and ultimately to make you better. NUBIC handles a lot of critical medical research data, and if you mess up, lose, corrupt, discard, or fail to properly store and retrieve even a day or two worth of critical research data it’s a major problem.
So, given my painful experience, and given how important it is that I don’t forget to do things right the first time, I’ve developed an exhaustive list of checklist items that I now go through in order to maximize my confidence in an app deployment. Am I ever 100% sure that a deployment will go well? No. I might be 100% confident, but that’s not really the same thing.
The list of things I mentally check off when deploying web apps:
- Deploying successfully is like playing chess. Analyze your position carefully and avoid blunders at every turn. A blunder that is serious enough might cost you the game (your job) and other people their data (possibly years of work). Tread lightly.
- You’ve made sure to capture any questions from yourself and other people into some kind of note that you’ll remember to address?
- You’re NOT in a rush, are you? Do this deploy correctly, and do it once.
- Are you working on a section of the code in which someone else is more expert, and if so have you done a sanity check with them regarding your changes?
- You’ve tested your code branch thoroughly?
- You’ve tested your changes locally?
- You’ve run the test suite locally?
- You’ve addressed all notes/questions/concerns listed in your ticket notes (both from you, and from other people)?
- You pulled in the latest FROM master, and done so in the correct order/direction to make sure integration with other changes should work?
- You’ve tested other recent changes to make sure you didn’t break things for anyone else?
- Is there any testing that is still manual that you can reasonably automate?
- Are there any tests that you didn’t write that should be written? If so, go write them!
- Are there any tests that you didn’t write that you thought maybe you should? If you didn’t write them, can you confidently say why they shouldn’t be written, and that you’re not just taking a shortcut?
- You’ve merged your branch LOCALLY into your LOCAL copy of master, and re-done all your testing?
- Does the git log look correct/as you expect it to look? If anything is unexpected, understand what happened and back out if necessary.
- Is there anything you’ve left unquestioned?
- You’ve proven that the scope of your changes is accurate, that there won’t be unanticipated ripple effects?
- Have you systematically analyzed your approach to try to find any hidden assumptions you’ve made about the code you’ve created/modified/removed? Did you test those assumptions?
- Have you done a sanity check to make sure that you’re operating in reality instead of some version of reality that you’ve concocted for yourself?
- Have you taken a step back and moved out of just the bug/feature/task you’re working on to ask, “Is there anything else that might be affected or that I haven’t thought of? Do I have any remaining concerns or doubts?” If so, STOP, write them down, and systematically address them.
- You’ve stepped back with your anti-tunnel vision steps and thought long and hard about the scope of your changes (this isn’t a duplicate, I actually want to check my conclusions about the scope of my changes more than once).
- ONLY WHEN READY, merge to REMOTE master (in the PROPER order/direction)?
- Is the Continuous Integration build it passing?
- You’ve deployed and tested your changes in the STAGING environment?
- You’ve run your changes past anyone who needs to see it (e.g. the person who originally opened the ticket, or the client, or both), and gotten sign off on how things are now working?
- You’ve made a list of any open questions you have about possible ripple effects, and systematically found answers to them, leaving nothing unanswered?
- You’ve asked about any recent problems with deployment that other developers might have had, and if there are any special steps you need to take during deployment?
- You’ve gone over your rollback plan? Are you sure? Does your deploy involve any database migrations?
- You’ve got your deploy scheduled at an appropriate time (e.g. during a maintenance window)?
- Have you notified any potentially affected parties that rely on your app and who may be affected if there is downtime?
- Did anything go wrong with the previous deployment, and if so did you update this checklist to reflect the lessons learned?
- Are you really sure you’re ready to deploy? It’s okay – it’s better to have a doubt but sleep well at night than fly into a situation and live to regret it.