Site icon R-bloggers

Git: Moving from Master to Main

[This article was first published on The Jumping Rivers Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In June 2020, GitHub announced that is was moving the default branch name from master to the more neutral name, main. GitLab followed suit in a few months later. Tobie Langel makes the salient point on why changing the name is a good thing:

So master is not only racist, it’s also a silly name in the first place.

The purpose of this post is summarise some of the challenges we faced when moving from master to main, with the goal that if you decide to make the same change, you’ll hopefully avoid some of the issues.

Renaming a Single Repository

Renaming a single repository is relatively straightforward. There are five main steps:

  1. Copy the master branch and history to main
  2. Push main to the remote repository, i.e. GitHub / GitLab
  3. Point HEAD to the main branch
  4. Change the default branch to main on the remote
  5. Delete the master branch on the remote repo

There are several nice descriptions on how to change a single repository. For example, Steven Mortimer has a nice blog post that leads you through the process.

While I’ve read of individuals making the move, I’ve not read about organisations making the change. I’m sure there are numerous companies that have made the move, I’ve just not seen them.

The Jumping Rivers Move

During August 2021, we started renaming our repositories from master to main. We deliberately chose August, because that month is the main school holiday in the UK. That means most of our clients and team are on holiday, so the impact of any change was reduced.

An Overview of Jumping Rivers Repositories

Protected default branches: At Jumping Rivers our default branch (master / main) is protected. This means that we can’t directly push into a repository. Instead we need to create a branch and merge.

Code owners: all repositories have a named list of repository owners. Depending on the repository, this is usually between two and six people. These are the members of the Jumping Rivers team who have permission to merge a branch onto the default branch (master / main). The person who made the initial merge request cannot merge into main. This has to be an additional team member.

Continuous integration: all repositories have a CI process. The CI ranges from very elaborate pipelines to (relatively) simple checks on the contents of committed files. In order to merge into the default branch the CI must pass. To avoid copying CI scripts to our repositories, all CI files are templated. A typically CI file looks something like:

include:
  # https://repo-url.com/ci-templates/-/blob/master/templates/r-package.yml
  - project: 'jumpingrivers/tools/ci-templates'
    ref: master
    file: '/templates/r-package.yml'

Note that the word “master” appears twice in this code chunk.

RStudio Package Manager (RSPM): We use RSPM to manage our R packages. Currently, any package that is tagged on our GitLab server is added to our RPSM. We also have neat scripts that automatically scan for new repositories and add them to RSPM without any user interaction.


Do you use RStudio Connect and Workbench? If so, checkout our managed RStudio services


RStudio Connect (RSC) & Shiny Servers: We deploy multiple Shiny applications and markdown documents to RStudio Connect. Likewise, we have similar pipelines with Shiny Server, Shiny Proxy, etc.

Web page: This website is hosted on GitLab and deployed via Netlify.

Historical repositories: As with many organisations, our standards have evolved over time. We’ve found that as working as a distributed team, a consistent repository structure helps us work together. The CI now enforces, amongst other things

When we go back to an old project, say 12 months ago, we often have to spend about ten minutes tighten up the repository. For a single project, this is fine. But a large number of simultaneous projects, this becomes time consuming.

Summary: The above points when taken together mean:

Potential Stragies for Moving

The Hybrid Approach

Our initial plan was to take a hybrid approach: alter the template CI process to handle either master or main. Then gradually move repositories across. We decided against this as

For larger organisations / companies this would be the correct strategy. I would estimate if we were double our current size, then we would have taken this route.

Moving sections

For historical reasons, our training materials and associated infrastructure live in a semi-independent repository project. As such, when we did a large notes update to our course notes around Christmas 2020, the training materials moved to main. Part of this overhaul was also implementing CODEOWNERS and template CI processes.

However the non-training repositories are all coupled via the CI process, so it wasn’t possible to easily move other sections.

Semi-hybrid Approach

We changed the CI template repository from master to main. As this is templated across every repository, this means that any changes to a repository would now fail the CI, unless the CI template file was updated. This provided a natural method for incorporating changes into repositories that were being actively used. We also identified key repositories, where we had to take extra care, for example our Website.

Next, we encouraged people to rename the protected branches from master to main. To be honest, I’m not sure how many people listened to this encouragement. There’s always something more pressing to do!

About four weeks after the process started, our CI now enforces that the default branch in a repository must be called main. Furthermore, if a master branch exists, it must be deleted. This approach worked for us, as we didn’t have (much) critical infrastructure. If a CI job broke for 10 minutes, then no harm was done.

Our final stage will take place in a couple of months. Where we’ll systemically work through groups of repos and make the change. For example, all R packages or all Shiny applications.

Hindsight is a Wonderful Thing

Overall the process wasn’t/isn’t too painful. But with hindsight, we did make things more difficult than they needed to be. To help anyone else looking to make this move, here’s a list of things I wish we had implemented from day 1:

  1. A clear guide for changing from master to main. This should be for your organisation, don’t just point to someone else’s blog post. Ensure you can copy the code and also make the guide easy to find – don’t just stick it in an email.

  2. If anyone queries that guide, update the guide. Avoid the temptation to store the answer in slack. Maintain and refer to that single point of truth.

  3. Remember to include a step for updating a local repo that has been changed by someone else.

  4. Have a slack channel for the move and use it. We had the first, but never really used it.

  5. When you delete and then create a protected branch, you should be clear about the standards within your organisation. We used the GitLab API to set the various options, e.g. merge method, if code owner approval is required, can a user force push. However, we had never decided what these default options should be, so this had to be documented (a good thing)!

  6. Initially when updating RSPM / RSC we made a slack request. This is a bad idea as it’s easy to get lost in the noise of slack. Instead, create a merge request and assign it to the correct person. That way it’s clear what has to be done.

  7. Most (all?) blog posts concentrate on the view of a single individual. When working in a team if someone updates the default branch, this obviously impacts everyone else. Once you know this is going to happen, then you simply run a few standard commands;

    git fetch --all         # update all remotes
    git checkout main       # checkout the new main
    # update local HEAD
    git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main
    
    git branch -d master    # delete local master
    git branch -rd origin/master      # delete remote master ref
    
  8. Our quietest month is followed by our busiest month! This can make things stressful.

  9. “We also identified key repositories that had to be updated”. I wish we had spent more time thinking about this.

There are also the “unknown unknowns”. Things we just didn’t expect. For example, we have an internal R package for accessing the gitlab API. The default branch is master. Should we change the default or leave it alone? We want for changing. Another annoying issue was one an older repository had issues, was this due to the master/main or was it something else.

Summary

In our experience, changing from master to main generated around thirty days of additional work. If we had followed our “hindsight” section, this could be reduced by around fifteen days. Making the change is worthwhile and the correct thing to do. But it will break lots of things. As everything has a CI process and requires a code review at Jumping Rivers, this meant we never actually broke anything (sort of), but it did add an unexpected (one-off) barrier to committing to a project. Another point to note is that the Jumping Rivers team is very technical. Everyone is familiar with Linux, tweaking CI files, and using the command line. This is not true of all teams.

If you are considering moving, then I’m more than happy to chat through our pain points with you. Just drop me an email at colin@jumpingrivers.com


For updates and revisions to this article, see the original post

To leave a comment for the author, please follow the link and comment on their blog: The Jumping Rivers Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.