Cleaning up forked GitHub repositories with {gh}
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
One great thing about using GitHub is the ability to view and contribute to others’ code. Even the code underlying many of our favourite packages is available for us to examine and play around with.
Forking a repository is a great way to create an exact replica of someone else’s project in our own user space. We can then freely make changes to this copy without affecting the original project. If you end up especially proud of your changes, you can then submit a Pull Request to offer them up to the owner of the original repository. However, your fork doesn’t have to end up in a contribution – you can also just keep experimenting with the code forever or use it as a starting point for your own project.
A forking mess
If you are an avid forker of GitHub repos, your original repositories on GitHub may quickly become crammed in between an endless stream of forked repos. Your user space has become very cluttered, with old forks that you haven’t looked at in years still taking up space. Well, now it’s time for some spring cleaning and the first task is de-cluttering your repositories by removing forks.
Do you use RStudio Pro? If so, checkout out our managed RStudio services
Manually cleaning
You can manually delete repositories using the GitHub interface. Go to the repository you wish to delete, then select Settings at the top of the page
Then scroll to the bottom of the page and enter the Danger Zone marked by a red box.
From there, you can select Delete this repository which will prompt you to confirm that you are absolutely sure of what you’re doing by typing out the name of the repository. Note that after deleting the repository, the action cannot be undone. Also note that if you are deleting a forked repository, deleting it will only remove it (including any changes you have made to it) from your own GitHub – you won’t accidentally delete the original project (phew).
So it is possible to clean up your GitHub manually and this might be the most suitable way if you’re only wanting to delete 1-2 repositories. But let’s say you’ve forked over fifty repositories. Manually going into each one, finding the delete button in the settings and typing in the confirmation prompt is not what you want to spend your day doing. As with all manual methods, pointing and clicking does not scale particularly well.
Using the {gh} package
The {gh} provides an R-user-friendly wrapper around the GitHub API. It lets you interact with GitHub to e.g. create new repositories or delete old ones directly from RStudio. The package is on CRAN and is installed in the usual way
install.packages("gh")
To use the package, you first need to generate a Personal Access Token (PAT).
Getting a token
Creating a personal access
token
to be able to use the GitHub API is easy. You can either navigate to the
page on GitHub (Settings > Developer Settings > Personal Access
tokens > Generate new token), or you can use the handy
create_github_token()
function from
{usethis} which will open the same
page in your browser.
usethis::create_github_token()
From there, you give your token a useful name as well as select what access should be granted by the token. Note: if you want to use {gh} to delete unwanted forked repositories, you will need to select the delete_repo scope.
However, be aware that this allows you to delete any repo – not just
forked ones. After deciding on the scopes, you generate your token. As
the page tells you, you will have to store your token somewhere as you
won’t be able to access it again after closing the page. We recommend
copying it and storing it in a password manager such as LastPass. Once
you have saved your token somewhere secure, you can make it available to
your R environment using the set_github_pat()
function from the
{credentials} package which will
prompt you to enter your PAT, which you did save somewhere… right? If
you did not follow our advice and now no longer have access to your PAT,
don’t worry, you can delete the old one on GitHub and generate a new
one.
OK, now that you’ve definitely got your token ready, you can run the code below
credentials::set_github_pat()
which will prompt you to enter your PAT. Now you can finally get to the cleaning!
Cleaning
We will load the {gh} package, as well as the {magrittr} package to get access to pipes.
library("gh") library("magrittr")
Step one is to retrieve your repositories
my_username = "your_username_goes_here" my_repos = gh("GET /users/:owner/repos", owner = my_username, page = 1, per_page = 100)
The GitHub API is paginated. This means it returns results in pages, with at most 100 results per page. If
length(my_repos)
is less than 100, then you don’t need to worry. If you have more than 100 repositories, you can either choose a page or loop through all pages.
The object my_repos
is now a list of repositories. Each element of the
list is a particular repository. We are interested in two particular
elements: name
and fork
:
my_repos[[1]]$name my_repos[[1]]$fork
These elements tell us the name of the repository and whether it was created as a result of forking. Now we just repeat this process for all of our repositories and filter to return only the repositories which are forked.
forked_repos = purrr::map_dfr(my_repos, ~unlist(.x[c("name", "fork")])) %>% dplyr::filter(fork == "TRUE") # Here "TRUE" is a character, not a logical
The next step involves manually, and very carefully selecting the repositories you want to delete. If you want to delete all forked repositories (!), simply set
# You probably don't want to do this! to_delete = forked_repos$name
Otherwise, create a vector of repositories to delete
to_delete = c("bob-does-tidytuesday", "melindas-cool-project", "a-random-r-package")
Finally we delete using
purrr::map(to_delete, ~gh("DELETE /repos/:owner/:repo", owner = my_username, repo = .x))
And… they’re gone!
Deleting forked repositories like this is an effective way to clean out your GitHub of repositories that you haven’t looked at or touched in a while. However, unlike doing it manually, there is no confirmation where you have to type out a specific repository’s name to confirm that you actually are deleting what you want to be deleting. So, be extremely careful when deleting repositories using {gh} as you don’t want to lose hours of work by accidentally running the wrong line.
For updates and revisions to this article, see the original post
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.