Export WordPress to Hugo RMarkdown or Org Mode with R

Peter Prevos

2 years ago

[This article was first published on Having Fun and Creating Value With the R Language on Lucid Manager, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I started my first website in 1996 with hand-written HTML. That became a bit of a chore, so about fifteen years, WordPress became my friend. I recently returned to a static website using Hugo. I tried the WordPress to Hugo exporter, but a lot of HTML artefacts were left in the Markdown output, and each file was in a separate folder. This article explains how to export a WordPress blog to Hugo and customise it with R code.

WordPress has been great to me, but it is slowly becoming a pain to keep updating plugins, security issues, slow performance and the annoying block editor. I am also always looking for additional activities I can do with Emacs. Hugo takes a lot of the pain of managing site away as you can focus on the content. Emacs provides me with excellent editing functionality.

Convert the content to Markdown or Org Mode

The first step is to export the WordPress posts database to a CSV file. Several plugins are available that help you with this task. Alternatively, you can link directly to the database and extract the data with the RMySQL package. I have used the WP All Export plugin to export the data. We need at least the following fields:

Title
Slug
Date
Content
Categories
Tags

The content files for Hugo are either Markdown or Org Mode. I prefer to use Org Mode as it provides me with access to the extensive functionality that Emacs has to offer, including writing and evaluating R code. Org Mode is comparable to RMarkdown. You can write and execute code snippets in Org Mode, just like in RMarkdown. Org-Mode has several other advantages because it also has a fully-featured task and project management system. This software also has superior editing options compared to anything that RStudio has to offer. In this code, you set your preferred file type with the export variable.

Screenshot of Emacs with R through the Emacs Speaks Statistics package.

The Content field in the WordPress database contains HTML code. The code below reads the exported CSV file and saves each content field as an HTML file. The mighty Pandoc software undertakes the conversion from HTML to Org Mode or Markdown, depending on the export variable, using the post slug as the file name. Any draft posts or pages in the export file will have NA as the file name.

## Export WP to Hugo

## Read exported WP content
library(tibble)
library(readr)
library(dplyr)
library(stringr)

posts <- read_csv("Posts-Export-2020-July-17-2245.csv")

## Convert to Org Mode or Markdown
export <- ".org" # ".org" or ".md"

for (i in 1:nrow(posts)) {
    filename <- paste0(posts$Slug[i], ".html")
    writeLines(posts$Content[i], filename)
    pandoc <- paste0("pandoc -o content/post/", posts$Slug[i], export, " ", filename)
    system(pandoc)
}

## Clean folder
file.remove(list.files(pattern = "*.html"))

The next step is to add the front matter for Hugo. The front matter for this export will contain the title, date and the original URL so that we can create a redirect to the new address.

Export WordPress to Hugo site

Now that we have some content, we need to provide the context in the front matter so that Hugo can build a site. Hugo knows several types of front matter, i.e. TOML, YAML, JSON and Org-Mode. This code provides either org Mode or TOML front matter for markdown files, depending on how you set the export variable.

## Create Org Mode files
baseurl <- "https://lucidmanager.org"

## Create front matter
if(export == ".org") {
    fm <- tibble(title = paste("#+title:", posts$Title),
                 date = paste("#+date:", as.POSIXct(posts$Date, origin = "1970-01-01")),
                 lastmod = paste("#+lastmod:", Sys.Date()),
                 categories = paste("#+categories[]:", str_replace_all(posts$Categories, " ", "-")),
                 tags = paste("#+tags[]:", str_replace_all(posts$Tags, " ", "-")),
                 draft = "#+draft: true") %>%
        mutate(categories = str_replace_all(categories, "\\|", " "),
               tags = str_replace_all(tags, "\\|", " "))
} else {
    fm <- tibble(opening = "+++",
                 title = paste0('title = "', posts$Title, '"'),
                 date = paste0('date = "', as.POSIXct(posts$Date, origin = "1970-01-01"), '"'),
                 lastmod = paste0('lastmod = "', Sys.Date(), '"'),
                 categories = paste0('categories = ["', posts$Categories, '"]'),
                 tags = paste0('tags = ["', posts$Tags, '"]'),
                 draft = 'draft = "true"',
                 close = "+++") %>%
        mutate(categories = str_replace_all(categories, "\\|", '", "'),
               tags = str_replace_all(tags, "\\|", '", "'))
}

## Load Hugo files an append front matter
for (f in 1:nrow(posts)) {
    filename <- paste0("content/post/", posts$Slug[f], export)
    post <- c(paste(fm[f, ]), "", readLines(filename))
    ## Repoint images
    post <- str_replace_all(post, paste0(baseurl, "/wp-content"), "/images")
    ## R Code highlighting
    post <- str_replace_all(post, "``` \\{.*", "")
    post <- str_replace_all(post, "```", "

") ## Remove remaining WordPress artefacts post <- str_remove_all(post, ':::|\\{.wp.*|.*\\"\\}') ## Write to disk writeLines(post , filename) }

Finalising and Publishing the new site

All you have to do now is to add a theme to your website, and your blog is fully converted. The Hugo website has a great Quick Start page that will get you going.

If you prefer R-markdown, then You can easily modify this code so you can use RStudio and the blogdown package.

This new site will not be perfect just yet. To show the images, you need to download your wp-content folder and move it to the static/images folder in Hugo. You will also need to change the permalink settings to ensure that no URL changes when you migrate your blog. There will be other bits and pieces that might not have adequately converted, so do check your pages.

To leave a comment for the author, please follow the link and comment on their blog: Having Fun and Creating Value With the R Language on Lucid Manager.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.