Site icon R-bloggers

Porting and redirecting a Hugo-based blogdown website to an HTTPS-enabled custom domain and how to do it the easy way

[This article was first published on Jozef's Rblog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

As we wrote in Should you start your R blog now?, blogging has probably never been more accessible to the general population, R users included. Usually, the simplest solution is to host your blog via a service that provides it for free, such as Netlify, GitHub or GitLab Pages. But what if you want to host that awesome blog on your own, HTTPS enabled domain?

In this post, we will look at how to port a Hugo-based website, such as a blogdown blog to our own domain, specifically focusing on GitLab Pages. We will also cover setting up SSL certificates, redirects from www to non-www sites and other details that I had to solve when porting my blogdown blog from GitLab’s hosting.

Contents

  1. If you are just starting – there is an easy way
  2. Serving a page on a custom domain via GitLab Pages
  3. Redirecting the gitlab.io address to the custom domain
  4. Redirecting www and non-www URLs to a single address
  5. References

If you are just starting – there is an easy way

This post is mostly a reminder-to-self of what porting this blog from GitLab Pages hosting to a custom domain entailed. The route I took was heavily influenced by the way I was serving the website at the beginning – using GitLab Pages on a project address. Migrating to a new domain I wanted to

  1. Keep all the functionality that a GitLab repository with GitLab CI/CD provides
  2. Serving the content at a custom HTTPS-enabled domain
  3. Redirecting to the new domain with minimal to no content duplication
  4. Making sure that the website works on both www and non-www addresses

If you have your blog hosted on GitLab pages and want to port and redirect it to your own HTTPS-enabled domain, while keeping the functionality that GitLab provides, you might find my journey useful.

Blogdown, Hugo & Let’s Encrypt logos

If you are just starting, it is easier to choose a different approach to publish your blog – here are 2 tips to consider if you want to prevent the pain I went through because of my past decisions:

What would I do if starting today

With the knowledge I gained when investigating the process, I would probably take the following route:

  1. Register a custom domain via CloudFlare. This should make non-www/www redirects and getting SSL certificates seamless
  2. Deploy the pages by connecting the GitLab repository to Netlify, which should be equally easy as using GitLab CI/CD
  3. Setup deployment to the custom domain via Netlify. This should make the redirects to the custom domain seamless and technically sound

Doing it the simplest way

Serving a Hugo-based website can in principle be even simpler – in fact, all that is really necessary is just copying/uploading contents of the public directory generated for example with blogdown::build_site() to the proper place. All that we do around it are processes that make your lives nicer at the cost of extra effort.

Serving a page on a custom domain via GitLab Pages

1. Choose and register a domain name

The first step is to choose a domain name (i.e. the web address) for your brand new website. This is completely up to you and the internet is full of tips like this one. Next, register that domain name with a provider of your choice. I use a local provider for all of my websites for many years, so the choice was easy.

To register your domain name, you can pick from a plethora of providers, each with their of own pros and cons

2. Setup an SSL certificate

Setting up your website such that it can be accessed via HTTPS should be the standard these days, so we also need to set up an SSL certificate. Once again, this should be simple as we can use free Let’s Encrypt certificates to achieve that.

The actual process once again depends on the provider of your domain services – in practice, it should entail just a few clicks in their web UI

3. Setup GitLab to serve your pages to a custom domain

Setting up GitLab Pages to be served to your own domain is well documented in GitLab’s documentation here and even better here.

If you have chosen CloudFlare as your service to manage the DNS, Nick Zeng wrote a detailed guide on how to setup GitLab pages with a custom domain. GitLab also has links to setting up DNS records for other hosting providers.

After these 3 steps, you should see your website served on your new domain and HTTPS should work just fine. Now onto the not-so-simple issues.

Redirecting the gitlab.io address to the custom domain

Server-side redirects are not supported with GitLab pages

Now that we can see our content on our new domain, we may want to take care of the fact that it is now visible on 2 addresses – the new custom domain and the original GitLab pages address. The traditional way of handling this is with server-side redirects – the server would issue an HTTP 301 Moved Permanently redirect to the new domain. The issue with that is that GitLab Pages does not support server-side redirects.

On the other hand, redirects are supported by GitHub pages, which have this feature and by Netlify as well.

JavaScript to the rescue

We can also find a suggestion to use refresh tags, but since using them is not always simple and server-side functionality is not available, we can opt for client-side JavaScript to solve our redirection issues.

It may not seem like a good idea from SEO perspective at first, but looking at some research on how Google handles JavaScript redirects it looks like the JavaScript redirects are quickly followed by Google. From an indexing standpoint, they are interpreted as 301s — the end-state URLs replaced the redirected URLs in Google’s index.

An example of a JavaScript implementation using the window.location object that can be used to get the current page address and to redirect the browser to a new page can look as follows:

function replacePath(path, old_d, new_d) {
path = path.replace(old_d, new_d);
if (path.includes(new_d)) {
// only if really on the new domain
path = path.replace("http:", "https:");
}
return path;
}
newpath = replacePath(
_____,
"://jozefhajnala.gitlab.io/r",
"://jozef.io"
);
// Prevent infinite redirect
if (_____ != newpath){
window.location.replace(newpath);
}

We would obviously replace the mentioned addresses by the desired ones and omit the https replacement if the new domain does not have SSL enabled.

We can test our JavaScript with a very simple function to see if all the URLs will be translated correctly:

// Place all urls into a variable
// this is just an example with a few
var oldLinks = [
"https://jozefhajnala.gitlab.io/r",
"https://jozefhajnala.gitlab.io/r/categories/rcase4base",
"https://jozefhajnala.gitlab.io/r/categories/rstudioaddins",
"https://jozefhajnala.gitlab.io/r/categories/various"
];
const old_d = "://jozefhajnala.gitlab.io/r";
const new_d = "://jozef.io";
// get the new links
var newLinks = oldLinks.map(x => replacePath(x, old_d, new_d));
function getStatus(url) {
var req = new XMLHttpRequest();
req.open("GET", url, false);
req.send(null);
return req.status;
}
// check the response statuses
// we want no 404s here, all 200 would be ideal
var statuses = newLinks.map(getStatus);

Setting canonical links

To be completely sure about duplicate content, if we have several similar versions of the same content, we can choose one version and point the search engines at this version by specifying a canonical URL.

To specify them using Hugo is very simple thanks to the way it provides variables and the partials approach to building themes. Simply add a line like this to your header.html or head.html partial file:

<link rel="canonical" href="{{ .Permalink }}">

The .html files for partials are usually located in the themes/<your_theme>/layouts/partials/ directory.

Redirecting www and non-www URLs to a single address

Another aspect of the move is to make sure that the content is available via both www.example.com and example.com, but not duplicated. Which of those is preferred is once again up to you. One solution would be to tell GitLab to serve the content to both and use the canonical link or use the JavaScript redirect again. However, there is a much nicer solution on offer here since for our own domain we can use server-side redirects.

If your provider of choice is CloudFlare, it seems that this redirect can be done in a few clicks via the web UI

Using .htaccess

One way to create this redirect is by using a .htaccess file. An example content, if you want to redirect to the www address with HTTPS, can look as follows:

RewriteEngine On
RewriteCond %{HTTPS} off
RewriteRule .* https://%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
RewriteCond %{HTTP_HOST} !^www\. [NC]
RewriteRule .* https://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]

Once you have the file ready, upload it to your site through an ftp client. If you host directly via your own server

  • place the .htaccess file into the proper directory, for example, /var/www/example.com/
  • do not forget to activate the apache mod_rewrite module using sudo a2enmod rewrite
  • an extra SSL certificate is likely to be needed for https to work correctly. The generation using Let’s Encrypt is simple using certbot, described for example here

Read more details on using .htaccess here and more details on using Mod_Rewrites for redirects here

References

Using custom domains with GitLab Pages

Redirects and duplicate content

Setting up virtual hosts, securing Apache with SSL, using .htaccess

To leave a comment for the author, please follow the link and comment on their blog: Jozef's Rblog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.