I came across a post recently by a machine learning engineer who made the bold claim that logistic regression is the worst name for an algorithm ever, or something along those lines1. Many statisticians of the more old-school type seemed to disagree. T...
Digging through our memory box, we came across a conversation from which we tried to piece together when it all began with rOpenSci.
On July 13, 2011, an email was sent with the idea of a shared blog, a clever domain name, and a way to connect R packa... [Read more...]
How long do wars last, on average? If a war such as that currently under way in Iran has lasted 74 days so far, how long do we expect it to last in total? For all sorts of reasons, inquiring minds are interested. Luckily there are some very well curate...
A high can make a regression model look impressively accurate — but this number can be deceptive. If you want to understand why a high is not always a sign of a good model, read on! In the post, Learning Data Science: Modelling Basics, we built a simple model to predict ...
Expected goals has become one of the most important concepts in modern football analytics. Instead of judging a team only by goals scored, xG helps us estimate the quality of the chances created. In this tutorial, we will build a practical expected goals model in R using football data, feature ... [Read more...]
Great strides in artificial intelligence development during the last five years produced agents that are now commonplace at work and home. It is humbling to note that virtually all frontier large language models today trace back to a preprint introducing the transformer neural network architecture – a fifteen-page paper that profoundly ...
1 Introduction
Differencing is one of the most common transformations in time series analysis.
It is also one of the easiest transformations to misunderstand.
In many ARIMA-style workflows, differencing is introduced almost mechanically: i...
Read it in: Español. We are excited to introduce the new team of mentors for the rOpenSci 2026 Champions Program! This year we have eleven individuals committed to open science, bringing together a rich diversity of backgrounds and perspectives. The t... [Read more...]
Introduction Differential Machine Learning (DML), as introduced in the recent arXiv paper (Differential Machine Learning for 0DTE Options with Stochastic Volatility and Jumps), extends supervised learning by incorporating not only function values but also their derivatives. In financial contexts, this often means sensitivities such as Greeks. However, when direct derivatives ...
I tend to write a lot of functions that create specific graphics implemented with ggplot2. Although I try to pick graphic parameters (e.g. colors, text size, etc.) that are reasonable, I will typically define all relevant aesthetics as param... [Read more...]
JAGS 5.0.0-beta is now available from SourceForge. The beta release is for two groups of people: Please send feedback via the JAGS forums or file a bug report The JAGS library The following packages are available: The rjags package In … Continue reading →
I’m getting more and more into data engineering these days and having used R for
a long time, I’m seeing a lot of problems that look nail-shaped to my R-shaped
hammer. The available tools to solve those problems exist for (presumably) very
good reasons, so I wanted to ...
Have you ever looked at a freshly plotted scatter plot and immediately thought, “Ah, this is clearly a logarithmic curve with some heteroskedastic noise,” without running a single line of modeling code? How do you do that? You don’t perform gradient descent in your head. You use your intuition! ...
Snow in Inwood, New York. Photograph by the author.
Recently I’ve been looking at hourly ridership data from the New York City Subway. Last time we learned that people go to work in the morning and come home in the eve... [Read more...]
A note to myself on survival analysis — KM curves, log-rank tests & Cox models 🧮 If I wrote it the way I understood it, maybe I’ll actually remember it 🤞
Motivations
We see survival analysis or more generally call...
rvflnet is an R package that implements a Random Vector Functional Link (RVFL) network. It is a nonlinear expressive version of glmnet that can be used for regression, classification and survival analysis.
Frank Harrell’s Regression Modeling Strategies online seminar will take place May 14, 15, 18, and 19. This workshop covers principled strategies for building, validating, and interpreting multivariable regression models for a wide range of outcomes, with emphasis on predictive accuracy, avoiding overfitting, and interpreting estimated effects. It explores spline methods, data reduction, benefits ... [Read more...]
Join our workshop on Reactive Shiny Apps and Deployment with Google Cloud Run: Intermediate R Shiny Workshop,  which is a part of our workshops for Ukraine series! Here’s some more info: Title: Reactive Shiny Apps and Deployment with Google Cloud Run: Intermediate R Shiny Workshop Date: Thursday, May 21st, 18:00 – 20:00 ... [Read more...]