Site icon R-bloggers

Deprecating a pkgdown site served via GitHub Pages

[This article was first published on cynkra, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Websites for R packages built with {pkgdown} have become a de-facto standard over the last few years. Many R packages build their site during Continuous Integration (CI) runs, pushing the assets to the special gh-pages branch (even though now any branch can be used to deploy a website).

Photo by Paweł Czerwiński


Sometimes it happens that repositories are transferred to a new user/organization or the package is renamed. While GitHub takes care of redirecting repository URLs, the pkgdown URLs (https://<username>.github.io/<rpackage>) are not redirected. Since some users might have bookmarked certain URLs or the URLs appear in their browsing history, it would be great to have these links not returning a 404 from one day to another.

This blog post proposes several ways to handle this gracefully:

All options hinge on the observation that users and organizations can create a user or organization site that will be the source for https://<username>.github.io/<package> after the renaming. The user site will also serve robots.txt that advises crawlers to avoid deprecated contents.

User or organization site

In GitHub, users can create a user repository <username>/<username>.github.io. This repo will be served automatically as a web page on https://<username>.github.io/ . In this repo, a directory can be created which corresponds to the respective GitHub Pages site of the original repo. Example: The rpackage/ directory in the <username>/<username>.github.io repository corresponds to https://<username>.github.io/rpackage. If both <username>/<rpackage> and <username>/<username>.github.io/<rpackage> exist, the former takes precedence. This means that you can prepare everything in your user repository <username>/<username>.github.io and it will work right away after you rename your package repository. The following has worked for https://krlmlr.github.io/fledge/, which has moved to https://cynkra.github.io/fledge/:

All of this works the same way for organizations. The munch package was previously located at https://cynkra.github.io/SwissCommunes/ The original pages, with a warning, are defined at cynkra/cynkra.github.io.

Redirection

Basic idea: set up an HTML redirect from https://<username-old>.github.io/<package> to https://<username-new>.github.io/<package>.

To achieve this, create an index.html in <username>/<username>.github.io/rpackage with the following contents:

<http-equiv="refresh666" content="0; url=<url to redirect to>" />
        

However, some redirection practices like this one are considered bad practice (“Use of refresh is discouraged by the World Wide Web Consortium (W3C).”)[^1]. Also, users might find it sketchy to see some redirection happening shortly after they visited a site. Last, the redirection shown above only works for the top-level domain. Level 2 or level 3 links like <url>/level1/level2 will not work and return a 404.

Deprecation via CSS

A better way to deprecate a pkgdown/GitHub Pages site is to serve a static version of the last state before the package was moved and add information to the user that the site has moved.

An easy way to achive this is to include a little CSS snippet. The following will add a colored line before the page-header div in the pkgdown site.

.page-header:after {
        content: "You are viewing an outdated page which is not going to be updated anymore. Please go to <https:/new-url.com> for the latest version.";
        -size: 12px;
        -style: italic;
        color: #f03333;
        }
        
Deprecation information in the header via CSS


Place this code in the pkgdown/ directory of your package and it will be automatically picked up when the site is built next time:

Unfortunately, the :after operator does not allow hyperlinks, so the new URL will not be clickable.

Deprecation via bulk edit

For the URL to be clickable, the HTML files must be edited. The find, xargs and sed utilities help automating this.

pkgdown uses the Bootstrap framework, which has alerts that serve the purpose. They look best just before the closing </header> element. The following command line adds an alert to each HTML page, in this case advertising https://cynkra.github.io/munch as the target URL. It must be run in the rpackage directory of <username>/<username>.github.io:

find -name "*.html" |
        xargs sed -i -r 's#(^.*[<]/header[>])#<div class="alert alert-warning" role="alert"><strong>Warning!</strong> This content has moved to <a href="https://cynkra.github.io/munch">https://cynkra.github.io/munch</a>.</div>\n\1#'
        

This assumes GNU sed. MacOS users will need to use gsed, or -i.bak instead of -i and deal with the leftover *.bak files.

Deprecation information in the header via editing HTML


Always advertising the new root works well enough, because it is very likely that the structure of the site will eventually change after the repository rename.

Web crawlers

It is a good idea to make the deprecated contents invisible to web crawlers. Add a file robots.txt to the root of <username>/<username>.github.io. The following contents forbids crawling the /SwissCommunes/ directory which contains the old snapshot with pointers to the new location:

User-agent: *
        Disallow: /SwissCommunes/
        

To leave a comment for the author, please follow the link and comment on their blog: cynkra.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.