In part 1 of this series I set
up Vowpal Wabbit to classify newspapers content. Now, let’s use the model to make predictions and
see how and if we can improve the model. Then, let’s train the model on the whole data.
Step 1: prepare the data
The first step ... [Read more...]
Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four
(1,
2,
3 and
4) blog posts, but
there’s still a lot to explore. This blog post uses a new batch of data announced on twitter:
For all who love to analyse text, the BnL released ...
Can I get enough of historical newspapers data? Seems like I don’t. I already wrote four
(1,
2,
3 and
4) blog posts, but
there’s still a lot to explore. This blog post uses a new batch of data announced on twitter:
For all who love to analyse text, the BnL released ... [Read more...]
This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for
free here. This is taken from Chapter 4,
in which I introduce the {stringr} package.
Manipulate strings with {stringr}
{stringr} contains functions to manipulate strings. In Chapter 10, I will teach you about ... [Read more...]
Introduction
I started off this year by exploring a world that was unknown to me, the world of historical newspapers.
I did not know that historical newspapers data was a thing, and have been thoroughly enjoying myself
exploring the different datasets published by the National Library of Luxembourg. You can ... [Read more...]
I have been playing around with historical newspaper data (see
here and
here). I have extracted the
data from the largest archive available, as described in the previous blog post, and now created
a shiny dashboard where it is possible to visualize the most common words per article, as well ... [Read more...]
Last week I wrote a blog post where I analyzed
one year of newspapers ads from 19th century newspapers. The data is made available by the
national library of Luxembourg.
In this blog post, which is part 1 of a 2 part series, I extract data from the 257gb archive, which
contains 10 ... [Read more...]
The national library of Luxembourg published
some very interesting data sets; scans of historical newspapers! There are several data sets that
you can download, from 250mb up to 257gb. I decided to take a look at the 32gb “ML Starter Pack”.
It contains high quality scans of one year of ... [Read more...]
This short blog post illustrates how easy it is to use R and Python in the same R Notebook thanks to the
{reticulate} package. For this to work, you might need to upgrade RStudio to the current preview version.
Let’s start by importing {reticulate}:
Your browser does not support the video tag.
In this short blog post I show you how you can use the {gganimate} package to create animations
from {ggplot2} graphs with data from UNU-WIDER.
WIID data
Just before Christmas, UNU-WIDER released a new edition of their World Income Inequality Database:
*NEW #... [Read more...]
This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for
free here. This is taken from Chapter 2, which explains
the different R objects you can manipulate as well as some functions to get you started.
Objects, types and useful R functions ... [Read more...]
This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for
free here. This is taken from Chapter 5, which presents
the {tidyverse} packages and how to use them to compute descriptive statistics and manipulate data.
In the text below, I show how ... [Read more...]
This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for
free here. This is taken from Chapter 5, which presents
the {tidyverse} packages and how to use them to compute descriptive statistics and manipulate data.
In the text below, I scrape a ... [Read more...]
This blog post is an excerpt of my ebook Modern R with the tidyverse that you can read for
free here. This is taken from Chapter 7, which deals
with statistical models. In the text below, I explain what hyper-parameters are, and as an example
I run a ridge regression using ... [Read more...]
Introduction
This blog posts will use several packages from the
{tidymodels} collection of packages, namely
{recipes},
{rsample} and
{parsnip} to train a random forest the tidy way. I will
also use {mlrMBO} to tune the hyper-parameters of the random forest.
Set up
Let’s load the needed packages:
Inspired by David Schoch’s blog post,
Traveling Beerdrinker Problem.
Check out his blog, he has some amazing posts!
Introduction
Luxembourg, as any proper European country, is full of castles. According to Wikipedia,
“By some optimistic estimates, there are as many as 130 castles in Luxembourg but more realistically
there are ... [Read more...]
Introduction
In this blog post, I’ll use the data that I cleaned in a previous
blog post, which you can download
here. If you want to follow along,
download the monthly data. In my last blog post
I showed how to perform a grid search the “tidy” way. As ... [Read more...]
Introduction
In this blog post, I’ll use the data that I cleaned in a previous
blog post, which you can download
here. If you want to follow along,
download the monthly data.
In the previous blog post, I used the auto.arima() function to very quickly get a “good-enough”
... [Read more...]
In this blog post, I will show you how you can quickly and easily forecast a univariate time series.
I am going to use data from the EU Open Data Portal on air passenger transport. You can find the
data here. I downloaded
the data in the TSV format for ... [Read more...]
Link to webscraping the data
Link to Analysis, part 1
Introduction
This is the third blog post that deals with data from the game NetHack, and oh boy, did a lot of
things happen since the last blog post! Here’s a short timeline of the events:
I scraped data from ... [Read more...]