Articles by arthur charpentier

Clusters of (French) Regions

February 9, 2016 | arthur charpentier

For the data scienec course of tomorrow, I just wanted to post some functions to illustrate cluster analysis. Consider the dataset of the French 2012 elections __ elections2012=read.table( "http://freakonometrics.free.fr/elections_2012_T1.csv",sep=";",dec=",",header=TRUE) __ voix=which(substr(names( + elections2012),1,11)=="X..Voix.Exp") __ elections2012=elections2012[1:96,] __ X=...
[Read more...]

Simple Distributions for Mixtures?

February 3, 2016 | arthur charpentier

The idea of GLMs is that given some covariates,  has a distribution in the exponential family (Gaussian, Poisson, Gamma, etc). But that does not mean that  has a similar distribution… so there is no reason to test for a Gamma model for  before running a Gamma regression, for instance. But ... [Read more...]

Confidence Regions for Parameters in the Simplex

January 18, 2016 | arthur charpentier

Consider here the case where, in some parametric inference problem, parameter  is a point in the Simplex, For instance, consider some regression, on compositional data, __ library(compositions) __ data(DiagnosticProb) __ Y=DiagnosticProb[,"type"]-1 __ X=DiagnosticProb[,c("A","B","C")] __ model = glm(Y~ilr(X),family=binomial) __ b = ilrInv(coef(model)[... [Read more...]

Inter-relationships in a matrix

December 1, 2015 | arthur charpentier

Last week, I wanted to displaying inter-relationships between data in a matrix. My friend Fleur, from AXA, mentioned an interesting possible application, in car accidents. In car against car accidents, it might be interesting to see which parts of the cars were involved. On https://www.data.gouv.fr/fr/, ...
[Read more...]

Profile Likelihood

November 16, 2015 | arthur charpentier

Consider some simulated data __ set.seed(1) __ x=exp(rnorm(100)) Assume that those data are observed i.id. random variables with distribution, with . The natural idea is to consider the maximum likelihood estimator For instance, consider some maximum likelihood estimator, __ library(MASS) __ (F=fitdistr(x,"gamma")) shape rate 1.4214497 0.8619969 (0.1822570) (0.1320717) __ F$estimate[1]+c(... [Read more...]

Variable Importance with Correlated Features

November 6, 2015 | arthur charpentier

Variable importance graphs are great tool to see, in a model, which variables are interesting. Since we usually use it with random forests, it looks like it is works well with (very) large datasets. The problem with large datasets is that a lot of features are ‘correlated’, and in that ... [Read more...]

Applications of Chi-Square Tests

November 3, 2015 | arthur charpentier

This morning, in our mathematical statistical class, we’ve seen the use of the chi-square test. The first one was related to some goodness of fit of a multinomial distribution. Assume that . In order to test  against , use the statistic Under , . For instance, we have the number of weddings, in ... [Read more...]

Tests, Power and Significance

October 14, 2015 | arthur charpentier

In the mathematical statistics course today, we started talking about tests, and decision rules. To illustrate all the concepts introduced today, we considered the case where we have a sample  with . And we want to test   against  In the course, we’ve seen that we could use a test based ... [Read more...]

Visualising a Circular Density

October 7, 2015 | arthur charpentier

This afternoon, Jean-Luc asked me some help about an old post I did publish, minuit, l’heure du crime; and some graphs published a few days after, where I used a different visualisation, in another post. The idea is that the hour can be seen as circular, in the sense ... [Read more...]

Playing with Leaflet (and Radar locations)

September 30, 2015 | arthur charpentier

Yesterday, my friend Fleur did show me some interesting features of the leaflet package, in R. library(leaflet) In order to illustrate, consider locations of (fixed) radars, in several European countries. To get the data, use download.file("http://carte-gps-gratuite.fr/radars/zones-de-danger-destinator.zip","radar.zip") unzip("radar.zip")   ext_...
[Read more...]

Computational Time of Predictive Models

September 25, 2015 | arthur charpentier

Tuesday, at the end of my 5-hour crash course on machine learning for actuaries, Pierre asked me an interesting question about computational time of different techniques. I’ve been presenting the philosophy of various algorithm, but I forgot to mention computational time. I wanted to try several classification algorithms on ...
[Read more...]

Convergence and Asymptotic Results

September 24, 2015 | arthur charpentier

Last week, in our mathematical statistics course, we’ve seen the law of large numbers (that was proven in the probability course), claiming that given a collection  of i.i.d. random variables, with To visualize that convergence, we can use __ m=100 __ mean_samples=function(n=10){ + X=matrix(rnorm(n*... [Read more...]

Minimalist Maps

September 5, 2015 | arthur charpentier

This week, I mentioned a series of maps, on Twitter, some minimalist maps http://t.co/YCNPf3AR9n (poke @visionscarto) pic.twitter.com/Ip9Tylsbkv — Arthur Charpentier (@freakonometrics) 2 Septembre 2015 Friday evening, just before leaving the office to pick-up the kids after their first week back in class, Matthew Champion (...
[Read more...]

On NCDF Climate Datasets

September 3, 2015 | arthur charpentier

Mid november, a nice workshop on big data and environment will be organized, in Argentina, We will talk a lot about climate models, and I wanted to play a little bit with those data, stored on http://dods.ipsl.jussieu.fr/mc2ipsl/. Since Ewen (aka @3wen) has been working ...
[Read more...]

“A 99% TVaR is generally a 99.6% VaR”

August 29, 2015 | arthur charpentier

Almost 6 years ago, I posted a brief comment on a sentence I found surprising, by that time, discovered in a report claiming that the expected shortfall […] at the 99 % level corresponds quite closely to the […] value-at-risk at a 99.6% level which was inspired by a remark in Swiss Experience report, expected shortfall […] ... [Read more...]

Pricing Game

August 22, 2015 | arthur charpentier

In November, with Romuald Elie and Jérémie Jakubowicz, we will organize a session during the 100% Actuaires day, in Paris, based on a “pricing game“. We provide two datasets, (motor insurance, third party claims), with 2  years of experience, and 100,000 policies. Each ‘team’ has to submit premium proposal for 36,000 potential ... [Read more...]
1 4 5 6 7 8 19

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)