Articles by arthur charpentier

The m=√p rule for random forests

October 19, 2024 | arthur charpentier

A couple of days ago, in our lab session, we discussed random forrests, and, since it was based on the example in ISLR, we had a quick discussion about the random choice of features, and the “” rule Interestingly, on that one, we can play a bit, and try all choices, ...
[Read more...]

CASdatasets 1.2.0

October 17, 2024 | arthur charpentier

Nearly ten years ago, Chrisophe Dutang and I launched a curated collection of datasets featured in Computational Actuarial Science with R, bundled in the CASdatasets R package. Now, this package offers an extensive range of actuarial datasets, serving as a vital resource for students, educators, and researchers alike. We’re ...
[Read more...]

Calculating an LOOCV MSE by hand

October 11, 2024 | arthur charpentier

Last week, we had an “mid-term” exam, for our introduction to statistical learning course.  The question is simple: consider three points, , here Consider here some linear models, estimated using least square techniques, what would be the leave-one-out cross-validation MSE ? I like this exercise since we can compute everything easily, by ...
[Read more...]

Discrimination by proxy (a real case study)

February 15, 2024 | arthur charpentier

Yesterday, with Laurence Barry, we posted a blog post “Who benefits from data sharing?” explaining why data sharing, in insurance, could end mutualization. Actually, it can also be bad in the context of discrimination. Consider here the same dataset, with claim occurence, in a real insurance portfolio, library(InsurFair) library(...
[Read more...]

Tweedie regression, or Poisson-Gamma regressions ?

February 8, 2024 | arthur charpentier

Yesterday, I was chating with a young and enthousiastic actuary, who asked a nice (and classical) question: is it the same, or not to use a Tweedie regression, or two regressions (Poisson, and Gamma). For distributions, the two are equivalent, but when we have heterogeneity and explanatory variable, I really ...
[Read more...]

Model selection, AIC and Tweedie regression

April 16, 2023 | arthur charpentier

Just some simple codes to illustrate some points we will discuss this week, for the last course on GLMs, before the final exam.  We have mentioned that the Gamma distribution belongs to the exponential, so we can run a regression, and compute the associated AIC, __ set.seed(123) __ test.data = rgamma(...
[Read more...]

Snow in Montréal (Canada)

January 29, 2023 | arthur charpentier

Winter started a bit more than one month ago… but we have already experienced many snow storms… there is still a lot snow in gardens and in the streets, I was wondering if it was that unusual, but apparently not. Compared with last year, it is (for the first months ...
[Read more...]

Monty Hall problem, with Thompson sampling

September 7, 2022 | arthur charpentier

We all know the Monty Hall problem. Recently, Jason Rosenhouse published a book on that topic (entitled The Monty Hall Problem, The Remarkable Story of Math’s Most Contentious Brain Teaser). The game is more or less described by the following question Suppose you’re on a game show, and ... [Read more...]

Lilliefors, Kolmogorov-Smirnov and cross-validation

January 5, 2021 | arthur charpentier

In statistics, Kolmogorov–Smirnov test is a popular procedure to test, from a sample is drawn from a distribution , or usually , where is some parametric distribution. For instance, we can test (where ) using that test. More specifically, I wanted to discuss today -values. Given let us draw samples of size , ...
[Read more...]

Insurance Pricing Game

December 18, 2020 | arthur charpentier

Would you like to put your data science skills to the test? Imperial College London, Universite du Quebec à Montreal (UQAM), and actuarial institutes in Singapore, the UK, including the IFoA, and Australia, ASTIN, the Casualty Actuarial Society are co-organising a global data science competition. Would you like to accurately predict ...
[Read more...]

Trees and forests

November 30, 2020 | arthur charpentier

For my ACT6100 weekly quiz, I usually generate some datasets, and then ask students to compare various predictive algorithms. Last week, it was about classification trees and random forests. And students were surprised to have such differences (they had to estimate the probability to have a specific label, for the ...
[Read more...]
1 2 3 19

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)