How to fix an undeployable main branch

Tech

Eric Kloss14.10.2022

What happened?

Usually, your SCM (source code management) and CI (continuous integration) are tied together pretty tightly. Let's say you have a Gitlab server and use Gitlab CI in combination with it to automate your production deployment. There are a lot of ways how your deployment process can be implemented, i.e.:

deploy when a git tag is created
deploy from a specific branch
deploy manually via gitlab UI
and so on ...

In some of those scenarios we encounter a fundamental problem. One of those scenarios is deploying manually from the main branch.

Fix production as fast as possible

The Problem

After a production release of your main branch (let's call it release one or R1), the world continues spinning and devs continue their work and hopefully merge their new features, improvements or planned bugfixes asap to the main branch. Those changes are part of the next production release (R2) that is scheduled somewhen in the future.

This works fine until a production incident occurs and the main branch already contains changes that are not yet deployable to production. Reasons for undeployable changes can be:

breaking changes
dependencies (mostly external services) that are not yet finished
business cases that must not be live before a critical point in time (f.e. content / functionality related to an external launch date)
and there are more …

Now your main branch is technically undeployable.

If your CI setup is able to deploy to production from another branch but main, you can release the hotfix from a feature branch. No problem so far. But if you are not allowed to do so, you cannot release the hotfix for the incident at all.

To illustrate an example, imagine the following git log:

! some feature that must not go live jet
! a breaking change in client code for an external API (external API not yet changed)
a long planned bugfix that includes a lot of refactoring
release commit R1 (current production version)
.... everything before R1

Your incident bugfix need to be committed on top of commit 1. The commits 1, 2 are not jet deployable (hard fact) since they would break the application. The commit 3 is theoretically deployable, but I would strongly recommend not to deploy it in the scope of the hotfix release (soft fact). Reasons for that are: commit 3 ...

might contain new bugs
increases the complexity on your road to the incident fix
makes your bugfix experiment harder to evaluate

So now, it's time to revert ;)

A possible solution

In general, you can avoid those problems completely using a more robust CI setup, that allows you to react to production incidents (more later). But if you have an incident that needs to be fixed now, you would rather not fix your CI setup in general, but try to work around it for the moment. One constraint for the workaround is: do not force push to your main branch.

What needs to be done in git?

You basically need two git commits to fix your deadlocked situation of not being able to release your artifact.

First, you need a git commit that reverts every change after the release commit of R1 - we call it the "revert to production version" commit. And after your incident is resolved, you need a git commit that brings you back to the current main state. Here is one way to create such a revert commit. This approach works with any number of commits since R1, including merge commits.

### git commit: revert to production (R1) # go to your main branch git checkout main # be really sure to be up-to-date git pull # reset local files to R1 git reset --hard R1 # set branch pointer back to today # without changing the local files git reset origin/main # let's assume we cannot push to main # but must create a feature branch git checkout -b hotfix/revert-to-and-fix-R1 # since the git branch is up-to-date # and the files represent R1 the current # diff already is the revert to prodcution # you just have to create a commit git add . git commit -m "revert to R1" # double check the revert commit git diff HEAD R1 # git diff should be empty # apply the hotfix and add it # to the branch git commit -am "fixed NRE" # create the local branch at origin git push -u origin HEAD

Now you can merge the hotfix into main and deploy it. Note that the history of main does not change. We only append more commits. No force-push, no secrets, no special permissions required.

Let's assume the incident is fixed. Time to restore the reverted features at main.

### git commit: "reverting the revert" # again: a feature branch git checkout -b task/restore-main # now you need to create an inverse commit # of the revert - basically a un-revert ;) git revert HEAD~1 # double check the commit git diff main-before-the-hotfix # git diff should only show the hotfix # continue as usual # push, review and merge

The merge of restore-main should bring you back to normal mode :)

Preventing the problem

As said before, there are tons of CI process strategies. Most of them are flexible enough to prevent the problem at all.

One example: Do not deploy from your main branch, but use a production branch. This branch then always points to the commit, that is currently deployed in production. Your main branch contains the most recent development state. When you want to do a production deployment, your main branch is merged to the production branch. This triggers the production deployment pipeline automatically or it needs to be started manually.

In this setup, a hotfix for a production incident can be directly committed to the production branch and eventually merged back to main, to also be part of the next regular release.

There are plenty other strategies that also prevents this problem, I would love to hear your favorites <3