Site icon R-bloggers

R code to accompany Real-World Machine Learning (Chapter 3)

[This article was first published on data prone - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Abstract

The rwml-R Github repo is updated with R code to accompany Chapter 3 of the book “Real-World Machine Learning” by Henrik Brink, Joseph W. Richards, and Mark Fetherolf.

Survivors on the Titanic

The Titanic Passengers dataset is used to illustrate various processes used to prepare data for modeling, including conversion of factor variables to dummy variables. For example, the code to produce the following table of processed data is provided:

Survived.yes Pclass Sex.male Age SibSp Parch Embarked.Q Embarked.S sqrtFare
0 3 1 22 1 0 0 1 2.692582
1 1 0 38 1 0 0 0 8.442944
1 3 0 26 0 0 0 1 2.815138
1 1 0 35 1 0 0 1 7.286975
0 3 1 35 0 0 0 1 2.837252
0 3 1 -1 0 0 1 0 2.908316

I also go “off-script” a bit (do some things not contained in the book) and demonstrate some useful visualization, modeling, and performance measuring techniques available with the caret and AppliedPredictiveModeling packages.

MNIST database of handwritten digits

A k-nearest neighbors classifier (from the kknn package) is used to predict the numbers represented in the MNIST database of handwritten digits. Examples of the types of digits present in the dataset and the R code to display them:

Auto MPG dataset

As an example of a linear regression analysis, the Auto MPG dataset introduced in Chapter 2 resurfaces and fuel economy is predicted from origin, year of production, and performance characteristics such as horsepower and engine displacement.

As always, feedback is welcome

As always, I’d love to hear from you if you find the project helpful or if you have any suggestions. Please leave a comment below or use the Tweet button. Also, feel free to fork the rwml-R repo and submit a pull request if you wish to contribute.

Download Fork

To leave a comment for the author, please follow the link and comment on their blog: data prone - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.