Articles by arthur charpentier

More neurons in the hidden layer than predictive features in neural nets

October 25, 2024 | arthur charpentier

This week, we were talking about neural networks for the first time, and I was saying that, in many illustrations of neural networks, there was a layer with fewer neurons than predictive variables, but sometimes, it could make sense to have more neurons in the layer than predictive variables, To ...

The m=√p rule for random forests

October 19, 2024 | arthur charpentier

A couple of days ago, in our lab session, we discussed random forrests, and, since it was based on the example in ISLR, we had a quick discussion about the random choice of features, and the “” rule Interestingly, on that one, we can play a bit, and try all choices, ...

CASdatasets 1.2.0

October 17, 2024 | arthur charpentier

Nearly ten years ago, Chrisophe Dutang and I launched a curated collection of datasets featured in Computational Actuarial Science with R, bundled in the CASdatasets R package. Now, this package offers an extensive range of actuarial datasets, serving as a vital resource for students, educators, and researchers alike. We’re ...

Calculating an LOOCV MSE by hand

October 11, 2024 | arthur charpentier

Last week, we had an “mid-term” exam, for our introduction to statistical learning course. The question is simple: consider three points, , here Consider here some linear models, estimated using least square techniques, what would be the leave-one-out cross-validation MSE ? I like this exercise since we can compute everything easily, by ...

Some updates about the insurance datasets package (CASdataset)

July 11, 2024 | arthur charpentier

Ten years ago, Computational Actuarial Science with R was published. With Christophe Dutang, we created at the same time an R package, collecting datasets used in the book. It was mainly to give access to the datasets to reproduce the applications, since functions used in the different chapters were coming ...

Discrimination by proxy (a real case study)

February 15, 2024 | arthur charpentier

Yesterday, with Laurence Barry, we posted a blog post “Who benefits from data sharing?” explaining why data sharing, in insurance, could end mutualization. Actually, it can also be bad in the context of discrimination. Consider here the same dataset, with claim occurence, in a real insurance portfolio, library(InsurFair) library(...

Tweedie regression, or Poisson-Gamma regressions ?

February 8, 2024 | arthur charpentier

Yesterday, I was chating with a young and enthousiastic actuary, who asked a nice (and classical) question: is it the same, or not to use a Tweedie regression, or two regressions (Poisson, and Gamma). For distributions, the two are equivalent, but when we have heterogeneity and explanatory variable, I really ...

Fairness and discrimination, PhD Course, #4 Wasserstein Distances and Optimal Transport

January 21, 2024 | arthur charpentier

For the fourth course, we will discuss Wasserstein distance and Optimal Transport. Last week, we mentioned distances, dissimilarity and divergences. But before talking about Wasserstein, we should mention Cramer distance. Cramer and Wasserstein distances The definition of Cramér distance, for , is while Wasserstein will be (also for ) If we ...

Creating automatically dozens of calendar notifications (with R)

January 14, 2024 | arthur charpentier

In a few days, we will have our annual NSERC-CRSNG meeting for grant reviews. In a nutshell (the process will be the same as last year), we get an excel file that looks like a calendar, with about 45 slots of 20 minutes, from Monday 8 am till Friday 5 pm. This year, I ...

Model selection, AIC and Tweedie regression

April 16, 2023 | arthur charpentier

Just some simple codes to illustrate some points we will discuss this week, for the last course on GLMs, before the final exam. We have mentioned that the Gamma distribution belongs to the exponential, so we can run a regression, and compute the associated AIC, __ set.seed(123) __ test.data = rgamma(...

Snow in Montréal (Canada)

January 29, 2023 | arthur charpentier

Winter started a bit more than one month ago… but we have already experienced many snow storms… there is still a lot snow in gardens and in the streets, I was wondering if it was that unusual, but apparently not. Compared with last year, it is (for the first months ...

Monty Hall problem, with Thompson sampling

September 7, 2022 | arthur charpentier

We all know the Monty Hall problem. Recently, Jason Rosenhouse published a book on that topic (entitled The Monty Hall Problem, The Remarkable Story of Math’s Most Contentious Brain Teaser). The game is more or less described by the following question Suppose you’re on a game show, and ... [Read more...]

Interpretability and explainability of predictive models

August 26, 2022 | arthur charpentier

In 400 AD, in his Confessiones, Augustine wrote quid est ergo tempus? si nemo ex me quaerat, scio; si quaerenti explicare velim, nescio that can be translated as What then is time? If no one asks me, I know what it is. If I wish to explain it to him who ...

Could there be incentives to cycle through a red light?

August 13, 2021 | arthur charpentier

This is of course a rhetorical question! Because cyclists must stop when the light is red! … But … there is always that moment, on a bicycle, when you stop, and then you say to yourself the worst part is that the lights are badly regulated, and I know that the next ...

From multinomial regression to binary classification on some Siamese data

March 14, 2021 | arthur charpentier

There are two kinds of people in the world: people who think there are two kinds of people in the world and people who don’t (borrowed from Menand (2018)). Because things are always simpler when we face only binary choice, aren’t they? But consider here the case were multiple ...

Some general thoughts on Partial Dependence Plots with correlated covariates

February 12, 2021 | arthur charpentier

The partial dependence plot is a nice tool to analyse the impact of some explanatory variables when using nonlinear models, such as a random forest, or some gradient boosting.The idea (in dimension 2), given a model for . The partial dependence plot for variable is model is function defined as . This ...

3rd Insurance Data Science Conference

January 25, 2021 | arthur charpentier

Registrations and call for abstracts, for the 3rd Insurance Data Science Conference, organised on-line 16 – 18 June 2021 (PM in Europe, AM in America), are now open. See https://insurancedatascience.org/ for more details…

Lilliefors, Kolmogorov-Smirnov and cross-validation

January 5, 2021 | arthur charpentier

In statistics, Kolmogorov–Smirnov test is a popular procedure to test, from a sample is drawn from a distribution , or usually , where is some parametric distribution. For instance, we can test (where ) using that test. More specifically, I wanted to discuss today -values. Given let us draw samples of size , ...

Insurance Pricing Game

December 18, 2020 | arthur charpentier

Would you like to put your data science skills to the test? Imperial College London, Universite du Quebec à Montreal (UQAM), and actuarial institutes in Singapore, the UK, including the IFoA, and Australia, ASTIN, the Casualty Actuarial Society are co-organising a global data science competition. Would you like to accurately predict ...

Trees and forests

November 30, 2020 | arthur charpentier

For my ACT6100 weekly quiz, I usually generate some datasets, and then ask students to compare various predictive algorithms. Last week, it was about classification trees and random forests. And students were surprised to have such differences (they had to estimate the probability to have a specific label, for the ...

1 2 3 … 19 »

Copyright © 2025 | MH Corporate basic by MH Themes