Articles by kjytay

Visualizing the relationship between multiple variables

August 24, 2019 | kjytay

Visualizing the relationship between multiple variables can get messy very quickly. This post is about how the ggpairs() function in the GGally package does this task, as well as my own method for visualizing pairwise relationships when all the variables … Continue reading →
[Read more...]

Changing the variable inside an R formula

August 23, 2019 | kjytay

I recently encountered a situation where I wanted to run several linear models, but where the response variables would depend on previous steps in the data analysis pipeline. Let me illustrate using the mtcars dataset: Let’s say I wanted to … Continue reading → [Read more...]

Looking at flood insurance claims with choroplethr

July 14, 2019 | kjytay

I recently learned how to use the choroplethr package through a short tutorial by the package author Ari Lamstein (youtube link here). To cement what I learned, I thought I would use this package to visualize flood insurance claims. I … Continue reading →
[Read more...]

Sampling paths from a Gaussian process

July 7, 2019 | kjytay

Gaussian processes are a widely employed statistical tool because of their flexibility and computational tractability. (For instance, one recent area where Gaussian processes are used is in machine learning for hyperparameter optimization.) A stochastic process is a Gaussian process if … Continue reading →
[Read more...]

Probability of winning a best-of-7 series

April 22, 2019 | kjytay

The NBA playoffs are in full swing! A total of 16 teams are competing in a playoff-format competition, with the winner of each best-of-7 series moving on to the next round. In each matchup, two teams play 7 basketball games … Continue reading →
[Read more...]

The sinh-arcsinh normal distribution

April 15, 2019 | kjytay

This month’s issue of Significance magazine has a very nice summary article of the sinh-arcsinh normal distribution. (Unfortunately, the article seems to be behind a paywall.) This distribution was first introduced by Chris Jones and Arthur Pewsey in 2009 as … Continue reading →
[Read more...]

Plots within plots with ggplot2 and ggmap

February 23, 2019 | kjytay

Once in a while, you might find yourself wanting to embed one plot within another plot. ggplot2 makes this really easy with the annotation_custom function. The following example illustrates how you can achieve this. (For all the code in one … Continue reading →
[Read more...]

Quantile regression in R

January 31, 2019 | kjytay

Quantile regression: what is it? Let be some response variable of interest, and let be a vector of features or predictors that we want to use to model the response. In linear regression, we are trying to estimate the conditional … Continue reading →
[Read more...]

pcLasso: a new method for sparse regression

January 13, 2019 | kjytay

I’m excited to announce that my first package has been accepted to CRAN! The package pcLasso implements principal components lasso, a new method for sparse regression which I’ve developed with Rob Tibshirani and Jerry Friedman. In this post, I will … Continue reading →
[Read more...]

A deep dive into glmnet: offset

January 9, 2019 | kjytay

I’m writing a series of posts on various function options of the glmnet function (from the package of the same name), hoping to give more detail and insight beyond R’s documentation. In this post, we will look at the offset … Continue reading →
[Read more...]

Using emojis as scatterplot points

December 27, 2018 | kjytay

Recently I wanted to learn how to use emojis as points in a scatterplot points. It seems like the emojifont package is a popular way to do it. However, I couldn’t seem to get it to work on my machine … Continue reading →
[Read more...]

All the (NBA) box scores you ever wanted

December 18, 2018 | kjytay

In this previous post, I showed how one can scrape top-level NBA game data from BasketballReference.com. In the post after that, I demonstrated how to scrape play-by-play data for one game. After writing those posts, I thought to myself: why … Continue reading →
[Read more...]

Recreating the NBA lead tracker graphic

December 13, 2018 | kjytay

For each NBA game, nba.com has a really nice graphic which tracks the point differential between the two teams throughout the game. Here is the lead tracker graphic for the game between the LA Clippers and the Phoenix Suns on … Continue reading →
[Read more...]

Scraping NBA game data from basketball-reference.com

December 11, 2018 | kjytay

I’m a casual NBA fan: I don’t have time to watch the games but enjoy viewing the highlights on Instagram/Youtube (especially Shaqtin’ A Fool!); I sometimes read game articles and analyses (e.g. Blogtable). Apart from the game being an amazing … Continue reading →
[Read more...]
1 2 3 4 5

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)