How to fix an undeployable main branch
What happened?
Usually, your SCM (source code management) and CI (continuous integration) are tied together pretty tightly. Let's say you have a Gitlab server and use Gitlab CI in combination with it to automate your production deployment. There are a lot of ways how your deployment process can be implemented, i.e.:
- deploy when a git tag is created
- deploy from a specific branch
- deploy manually via gitlab UI
- and so on ...
In some of those scenarios we encounter a fundamental problem. One of those scenarios is deploying manually from the main branch.
The Problem
After a production release of your main branch (let's call it release one or R1), the world continues spinning and devs continue their work and hopefully merge their new features, improvements or planned bugfixes asap to the main branch. Those changes are part of the next production release (R2) that is scheduled somewhen in the future.
This works fine until a production incident occurs and the main branch already contains changes that are not yet deployable to production. Reasons for undeployable changes can be:
- breaking changes
- dependencies (mostly external services) that are not yet finished
- business cases that must not be live before a critical point in time (f.e. content / functionality related to an external launch date)
- and there are more …
Now your main branch is technically undeployable.
If your CI setup is able to deploy to production from another branch but main, you can release the hotfix from a feature branch. No problem so far. But if you are not allowed to do so, you cannot release the hotfix for the incident at all.
To illustrate an example, imagine the following git log:
- ! some feature that must not go live jet
- ! a breaking change in client code for an external API (external API not yet changed)
- a long planned bugfix that includes a lot of refactoring
- release commit R1 (current production version)
- .... everything before R1
Your incident bugfix need to be committed on top of commit 1. The commits 1, 2 are not jet deployable (hard fact) since they would break the application. The commit 3 is theoretically deployable, but I would strongly recommend not to deploy it in the scope of the hotfix release (soft fact). Reasons for that are: commit 3 ...
- might contain new bugs
- increases the complexity on your road to the incident fix
- makes your bugfix experiment harder to evaluate
So now, it's time to revert ;)
A possible solution
In general, you can avoid those problems completely using a more robust CI setup, that allows you to react to production incidents (more later). But if you have an incident that needs to be fixed now, you would rather not fix your CI setup in general, but try to work around it for the moment. One constraint for the workaround is: do not force push to your main branch.
What needs to be done in git?
You basically need two git commits to fix your deadlocked situation of not being able to release your artifact.
First, you need a git commit that reverts every change after the release commit of R1 - we call it the "revert to production version" commit. And after your incident is resolved, you need a git commit that brings you back to the current main state. Here is one way to create such a revert commit. This approach works with any number of commits since R1, including merge commits.
Now you can merge the hotfix into main and deploy it. Note that the history of main does not change. We only append more commits. No force-push, no secrets, no special permissions required.
Let's assume the incident is fixed. Time to restore the reverted features at main.
The merge of restore-main should bring you back to normal mode :)
Preventing the problem
As said before, there are tons of CI process strategies. Most of them are flexible enough to prevent the problem at all.
One example: Do not deploy from your main branch, but use a production branch. This branch then always points to the commit, that is currently deployed in production. Your main branch contains the most recent development state. When you want to do a production deployment, your main branch is merged to the production branch. This triggers the production deployment pipeline automatically or it needs to be started manually.
In this setup, a hotfix for a production incident can be directly committed to the production branch and eventually merged back to main, to also be part of the next regular release.
There are plenty other strategies that also prevents this problem, I would love to hear your favorites <3