Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In June 2020,
GitHub
announced that is was moving the default branch name from master
to
the more neutral name, main
.
GitLab
followed suit in a few months later. Tobie
Langel makes the
salient point on why changing the name is a good thing:
So master
is not only racist, it’s also a silly name in the first
place.
The purpose of this post is summarise some of the challenges we faced
when moving from master
to main
, with the goal that if you decide to
make the same change, you’ll hopefully avoid some of the issues.
Renaming a Single Repository
Renaming a single repository is relatively straightforward. There are five main steps:
- Copy the master branch and history to
main
- Push
main
to the remote repository, i.e. GitHub / GitLab - Point HEAD to the
main
branch - Change the default branch to
main
on the remote - Delete the
master
branch on the remote repo
There are several nice descriptions on how to change a single repository. For example, Steven Mortimer has a nice blog post that leads you through the process.
While I’ve read of individuals making the move, I’ve not read about organisations making the change. I’m sure there are numerous companies that have made the move, I’ve just not seen them.
The Jumping Rivers Move
During August 2021, we started renaming our repositories from master
to main
. We deliberately chose August, because that month is the main
school holiday in the UK. That means most of our clients and team are on
holiday, so the impact of any change was reduced.
An Overview of Jumping Rivers Repositories
Protected default branches: At Jumping
Rivers our default branch (master
/
main
) is protected. This means that we can’t directly push into a
repository. Instead we need to create a branch and merge.
Code owners: all repositories have a named list of repository
owners. Depending on the repository, this is usually between two and six
people. These are the members of the Jumping Rivers team who have
permission to merge a branch onto the default branch (master
/
main
). The person who made the initial merge request cannot merge
into main. This has to be an additional team member.
Continuous integration: all repositories have a CI process. The CI ranges from very elaborate pipelines to (relatively) simple checks on the contents of committed files. In order to merge into the default branch the CI must pass. To avoid copying CI scripts to our repositories, all CI files are templated. A typically CI file looks something like:
include: # https://repo-url.com/ci-templates/-/blob/master/templates/r-package.yml - project: 'jumpingrivers/tools/ci-templates' ref: master file: '/templates/r-package.yml'
Note that the word “master” appears twice in this code chunk.
RStudio Package Manager (RSPM): We use RSPM to manage our R packages. Currently, any package that is tagged on our GitLab server is added to our RPSM. We also have neat scripts that automatically scan for new repositories and add them to RSPM without any user interaction.
Do you use RStudio Connect and Workbench? If so, checkout our managed RStudio services
RStudio Connect (RSC) & Shiny Servers: We deploy multiple Shiny applications and markdown documents to RStudio Connect. Likewise, we have similar pipelines with Shiny Server, Shiny Proxy, etc.
Web page: This website is hosted on GitLab and deployed via Netlify.
Historical repositories: As with many organisations, our standards have evolved over time. We’ve found that as working as a distributed team, a consistent repository structure helps us work together. The CI now enforces, amongst other things
- a README.md file
- a project description file
- a CODEOWNERS file with at least two names
- a “minimum” .gitignore file
When we go back to an old project, say 12 months ago, we often have to spend about ten minutes tighten up the repository. For a single project, this is fine. But a large number of simultaneous projects, this becomes time consuming.
Summary: The above points when taken together mean:
- Every repo will need to be changed from
master
tomain
. - As every repo has a CI process that depends on a
master
branch, every repo will need a minor update to the CI file. - As every repo has a CODEOWNERS file, this means another team member will be required to approve and merge any merge request.
- Changing a repository from
master
tomain
will also require a manual change in RSC or RSPM to point a different branch. - Changing old projects will incur the wrath of the CI, as other tidying jobs will be required.
Potential Stragies for Moving
The Hybrid Approach
Our initial plan was to take a hybrid approach: alter the template CI
process to handle either master
or main
. Then gradually move
repositories across. We decided against this as
- this was going to be a substantial piece of work in itself, which would ultimately be binned.
- we have a number of Shiny apps that generate overviews of repositories; this would also need to go into hybrid mode.
- maintaining both a
master
&main
version would increase work & maintenance
For larger organisations / companies this would be the correct strategy. I would estimate if we were double our current size, then we would have taken this route.
Moving sections
For historical reasons, our training materials and associated
infrastructure live in a semi-independent repository project. As such,
when we did a large notes update to our course notes around Christmas
2020, the training materials moved to main
. Part of this overhaul was
also implementing CODEOWNERS and template CI processes.
However the non-training repositories are all coupled via the CI process, so it wasn’t possible to easily move other sections.
Semi-hybrid Approach
We changed the CI template repository from master
to main
. As this
is templated across every repository, this means that any changes to a
repository would now fail the CI, unless the CI template file was
updated. This provided a natural method for incorporating changes into
repositories that were being actively used. We also identified key
repositories, where we had to take extra care, for example our Website.
Next, we encouraged people to rename the protected branches from
master
to main
. To be honest, I’m not sure how many people listened
to this encouragement. There’s always something more pressing to do!
About four weeks after the process started, our CI now enforces that the
default branch in a repository must be called main
. Furthermore, if a
master
branch exists, it must be deleted. This approach worked for us,
as we didn’t have (much) critical infrastructure. If a CI job broke for
10 minutes, then no harm was done.
Our final stage will take place in a couple of months. Where we’ll systemically work through groups of repos and make the change. For example, all R packages or all Shiny applications.
Hindsight is a Wonderful Thing
Overall the process wasn’t/isn’t too painful. But with hindsight, we did make things more difficult than they needed to be. To help anyone else looking to make this move, here’s a list of things I wish we had implemented from day 1:
-
A clear guide for changing from
master
tomain
. This should be for your organisation, don’t just point to someone else’s blog post. Ensure you can copy the code and also make the guide easy to find – don’t just stick it in an email. -
If anyone queries that guide, update the guide. Avoid the temptation to store the answer in slack. Maintain and refer to that single point of truth.
-
Remember to include a step for updating a local repo that has been changed by someone else.
-
Have a slack channel for the move and use it. We had the first, but never really used it.
-
When you delete and then create a protected branch, you should be clear about the standards within your organisation. We used the GitLab API to set the various options, e.g. merge method, if code owner approval is required, can a user force push. However, we had never decided what these default options should be, so this had to be documented (a good thing)!
-
Initially when updating RSPM / RSC we made a slack request. This is a bad idea as it’s easy to get lost in the noise of slack. Instead, create a merge request and assign it to the correct person. That way it’s clear what has to be done.
-
Most (all?) blog posts concentrate on the view of a single individual. When working in a team if someone updates the default branch, this obviously impacts everyone else. Once you know this is going to happen, then you simply run a few standard commands;
git fetch --all # update all remotes git checkout main # checkout the new main # update local HEAD git symbolic-ref refs/remotes/origin/HEAD refs/remotes/origin/main git branch -d master # delete local master git branch -rd origin/master # delete remote master ref
-
Our quietest month is followed by our busiest month! This can make things stressful.
-
“We also identified key repositories that had to be updated”. I wish we had spent more time thinking about this.
There are also the “unknown unknowns”. Things we just didn’t expect. For
example, we have an internal R package for accessing the gitlab API. The
default branch is master
. Should we change the default or leave it
alone? We want for changing. Another annoying issue was one an older
repository had issues, was this due to the master
/main
or was it
something else.
Summary
In our experience, changing from master
to main
generated around
thirty days of additional work. If we had followed our “hindsight”
section, this could be reduced by around fifteen days. Making the change
is worthwhile and the correct thing to do. But it will break lots of
things. As everything has a CI process and requires a code review at
Jumping Rivers, this meant we never actually broke anything (sort of),
but it did add an unexpected (one-off) barrier to committing to a
project. Another point to note is that the Jumping Rivers team is very
technical. Everyone is familiar with Linux, tweaking CI files, and using
the command line. This is not true of all teams.
If you are considering moving, then I’m more than happy to chat through our pain points with you. Just drop me an email at colin@jumpingrivers.com
For updates and revisions to this article, see the original post
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.