Articles by R on Data & The World

Binary Missing Value Imputation

November 20, 2021 | R on Data & The World

A few datasets that I’ve seen have come with several different columns representing binary responses to questions. Naturally, there are missing values scattered throughout, so some amount of imputation had to occur. I decided to try coding up a w... [Read more...]

Markov Transition (Animated) Plots

November 6, 2021 | R on Data & The World

This is a quick post intended for animating how the transition matrix of a Markov chain changes between larger time steps, as well as showing the probability of the chain being in any specified state over time. This post uses the tidyverse, along with ... [Read more...]

The Four Pipes of magrittr

September 6, 2021 | R on Data & The World

The magrittr package is a part of the extended tidyverse – i.e., not one of the ones normally loaded. It is the one that supplies the pipe operator (%__%), but it turns out that the package actually contains four pipe operators in total. All ar...
[Read more...]

Booleans & NAs

May 6, 2021 | R on Data & The World

Missing values are inevitable in data science, and handling them is a constant issue. In the case of Boolean logic, it can behave fairly differently depending on the order of arguments and exactly how it is set up, unlike a lot of other data types. Whether this is useful or ... [Read more...]

LDA vs QDA vs Logistic Regression

November 28, 2020 | R on Data & The World

There are plenty of methods to choose from for classification problems, all with their own strengths and weaknesses. This post will try to compare three of the more basic ones: linear discriminant analysis (LDA), quadratic discriminant analysis (QDA), ...
[Read more...]

Matrix to LaTeX

August 15, 2020 | R on Data & The World

I recently had to go through some matrix operations in R and then write up the results in LaTeX. Formatting the R output to get it into a form for LaTeX isn’t particularly hard, but it’s tedious and it has a regular structure, so it seemed like it ... [Read more...]

An Example With accumulate()

July 28, 2020 | R on Data & The World

As with most useful (collections of) libraries, the tidyverse has a lot to offer. One interesting bit that I found recently was the accumulate() function in the purrr library, which allows you to apply a function over a succession of values in a vector. This post is a quick example ...
[Read more...]

Spotify Cross-Playlist Predictions, Part 2

July 25, 2020 | R on Data & The World

This is a follow up to the previous post, where the mechanics of making cross-playlist predictions were covered. This post covers the second half of the project: now that we have the analysis method and the important functions worked out in practice, we need to code this functionality into a ...
[Read more...]

Spotify Cross-Playlist Predictions, Part 1

July 11, 2020 | R on Data & The World

This is the first of probably two posts detailing the construction of an RShiny app. The app in question is meant to take data from two Spotify playlists, make recommendations for tracks from one – which I’ll call the “target” playlist – based on the contents of another – the “reference” playlist. ...
[Read more...]

Formatting With ggtext Example

July 3, 2020 | R on Data & The World

This is a quick example regarding the ggtext package. It’s one of the many packages that extends ggplot2, with this one having a focus on adding and formatting text in graphs. The particularly interesting thing for me is that it allows Markdown and other formatting of the labels in ...
[Read more...]

Detecting Streaks in R

June 5, 2020 | R on Data & The World

Inspired by this post, which tries to calculate streaks in Python’s pandas library, I thought I’d give it a try in R, since it’s all just dataframe operations in the Python post. I won’t repeat his analysis, but I will replicate the streak determination and some ...
[Read more...]

538 Dungeons & Dragons Riddler

May 22, 2020 | R on Data & The World

This problem was the Riddler Classic on 538 for May 15, 2020. The problem is as follows: The fifth edition of Dungeons & Dragons introduced a system of “advantage and disadvantage.” When you roll a die “with advantage,” you roll the die twice and keep the higher result. Rolling “with disadvantage” is similar, except ...
[Read more...]

Looking Normal(ly Distributed)

May 12, 2020 | R on Data & The World

Among all probability distributions, the normal distribution is probably the most well-established and well-characterized. The importance of things like the central limit theorem and the normality assumptions in linear regression highlight it well. One of the more interesting ones is the fact that you can approximate a binomial distribution with ...
[Read more...]

An (Animated) Example of Bayesian Updating

April 11, 2020 | R on Data & The World

Bayesian statistics is centered on constructing certain assumptions about how the probability of an event is distributed, and then adjusting that belief as new information comes in. It can be more involved to construct a Bayesian model as opposed to the “look at many things in aggregate” approach used in ...
[Read more...]

MST3K Episode vs Movie Scores

February 8, 2020 | R on Data & The World

First broadcast in 1988, Mystery Science Theater 3000 is a television show whose nominal story involves a guy being trapped in space by a couple of mad scientist types…which is actually just an excuse to have a few guys make fun of really, really bad movies. This raises a few unusual ...
[Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)