Articles by arthur charpentier

Clusters of (French) Regions

February 9, 2016 | arthur charpentier

For the data scienec course of tomorrow, I just wanted to post some functions to illustrate cluster analysis. Consider the dataset of the French 2012 elections __ elections2012=read.table( "http://freakonometrics.free.fr/elections_2012_T1.csv",sep=";",dec=",",header=TRUE) __ voix=which(substr(names( + elections2012),1,11)=="X..Voix.Exp") __ elections2012=elections2012[1:96,] __ X=...

[Read more...]

Simple Distributions for Mixtures?

February 3, 2016 | arthur charpentier

The idea of GLMs is that given some covariates, has a distribution in the exponential family (Gaussian, Poisson, Gamma, etc). But that does not mean that has a similar distribution… so there is no reason to test for a Gamma model for before running a Gamma regression, for instance. But ... [Read more...]

Confidence Regions for Parameters in the Simplex

January 18, 2016 | arthur charpentier

Consider here the case where, in some parametric inference problem, parameter is a point in the Simplex, For instance, consider some regression, on compositional data, __ library(compositions) __ data(DiagnosticProb) __ Y=DiagnosticProb[,"type"]-1 __ X=DiagnosticProb[,c("A","B","C")] __ model = glm(Y~ilr(X),family=binomial) __ b = ilrInv(coef(model)[... [Read more...]

Regression with Splines: Should we care about Non-Significant Components?

January 4, 2016 | arthur charpentier

Following the course of this morning, I got a very interesting question from a student of mine. The question was about having non-significant components in a splineregression. Should we consider a model with a small number of knots and all components significant, or one with a (much) larger number of ...

[Read more...]

How Could Classification Trees Be So Fast on Categorical Variables?

December 8, 2015 | arthur charpentier

I think that over the past months, I have been saying non-correct things about classification with categorical covariates. Because I never took time to look at it carefuly. Consider some simulated dataset, with a logistic regression, __ n=1e3 __ set.seed(1) __ X1=runif(n) __ q=quantile(X1,(0:26)/26) __ q[1]=0 __ X2=cut(X1,... [Read more...]

Inter-relationships in a matrix

December 1, 2015 | arthur charpentier

Last week, I wanted to displaying inter-relationships between data in a matrix. My friend Fleur, from AXA, mentioned an interesting possible application, in car accidents. In car against car accidents, it might be interesting to see which parts of the cars were involved. On https://www.data.gouv.fr/fr/, ...

[Read more...]

Additional thoughts about ‘Lorenz curves’ to compare models

November 28, 2015 | arthur charpentier

A few month ago, I did mention a graph, of some so-called Lorenz curves to compare regression models, see e.g. Progressive’s slides (thanks Guillaume for the reference) The idea is simple. Consider some model for the pure premium (in insurance, it is the quantity that we like to ...

[Read more...]

Profile Likelihood

November 16, 2015 | arthur charpentier

Consider some simulated data __ set.seed(1) __ x=exp(rnorm(100)) Assume that those data are observed i.id. random variables with distribution, with . The natural idea is to consider the maximum likelihood estimator For instance, consider some maximum likelihood estimator, __ library(MASS) __ (F=fitdistr(x,"gamma")) shape rate 1.4214497 0.8619969 (0.1822570) (0.1320717) __ F$estimate[1]+c(... [Read more...]

Variable Importance with Correlated Features

November 6, 2015 | arthur charpentier

Variable importance graphs are great tool to see, in a model, which variables are interesting. Since we usually use it with random forests, it looks like it is works well with (very) large datasets. The problem with large datasets is that a lot of features are ‘correlated’, and in that ... [Read more...]

Applications of Chi-Square Tests

November 3, 2015 | arthur charpentier

This morning, in our mathematical statistical class, we’ve seen the use of the chi-square test. The first one was related to some goodness of fit of a multinomial distribution. Assume that . In order to test against , use the statistic Under , . For instance, we have the number of weddings, in ... [Read more...]

Statistical Tests: Asymptotic, Exact, ou based on Simulations?

October 20, 2015 | arthur charpentier

This morning, in our mathematical statistics course, we’ve been discussing the ‘proportion test‘, i.e. given a sample of Bernoulli trials, with , we want to test against A natural test (which can be related to the maximum likelihood ratio test) is based on the statistic The test function is ... [Read more...]

Tests, Power and Significance

October 14, 2015 | arthur charpentier

In the mathematical statistics course today, we started talking about tests, and decision rules. To illustrate all the concepts introduced today, we considered the case where we have a sample with . And we want to test against In the course, we’ve seen that we could use a test based ... [Read more...]

Visualising a Circular Density

October 7, 2015 | arthur charpentier

This afternoon, Jean-Luc asked me some help about an old post I did publish, minuit, l’heure du crime; and some graphs published a few days after, where I used a different visualisation, in another post. The idea is that the hour can be seen as circular, in the sense ... [Read more...]

Playing with Leaflet (and Radar locations)

September 30, 2015 | arthur charpentier

Yesterday, my friend Fleur did show me some interesting features of the leaflet package, in R. library(leaflet) In order to illustrate, consider locations of (fixed) radars, in several European countries. To get the data, use download.file("http://carte-gps-gratuite.fr/radars/zones-de-danger-destinator.zip","radar.zip") unzip("radar.zip") ext_...

[Read more...]

Computational Time of Predictive Models

September 25, 2015 | arthur charpentier

Tuesday, at the end of my 5-hour crash course on machine learning for actuaries, Pierre asked me an interesting question about computational time of different techniques. I’ve been presenting the philosophy of various algorithm, but I forgot to mention computational time. I wanted to try several classification algorithms on ...

[Read more...]

Convergence and Asymptotic Results

September 24, 2015 | arthur charpentier

Last week, in our mathematical statistics course, we’ve seen the law of large numbers (that was proven in the probability course), claiming that given a collection of i.i.d. random variables, with To visualize that convergence, we can use __ m=100 __ mean_samples=function(n=10){ + X=matrix(rnorm(n*... [Read more...]

Minimalist Maps

September 5, 2015 | arthur charpentier

This week, I mentioned a series of maps, on Twitter, some minimalist maps http://t.co/YCNPf3AR9n (poke @visionscarto) pic.twitter.com/Ip9Tylsbkv — Arthur Charpentier (@freakonometrics) 2 Septembre 2015 Friday evening, just before leaving the office to pick-up the kids after their first week back in class, Matthew Champion (...

[Read more...]

On NCDF Climate Datasets

September 3, 2015 | arthur charpentier

Mid november, a nice workshop on big data and environment will be organized, in Argentina, We will talk a lot about climate models, and I wanted to play a little bit with those data, stored on http://dods.ipsl.jussieu.fr/mc2ipsl/. Since Ewen (aka @3wen) has been working ...

[Read more...]

“A 99% TVaR is generally a 99.6% VaR”

August 29, 2015 | arthur charpentier

Almost 6 years ago, I posted a brief comment on a sentence I found surprising, by that time, discovered in a report claiming that the expected shortfall […] at the 99 % level corresponds quite closely to the […] value-at-risk at a 99.6% level which was inspired by a remark in Swiss Experience report, expected shortfall […] ... [Read more...]

Pricing Game

August 22, 2015 | arthur charpentier

In November, with Romuald Elie and Jérémie Jakubowicz, we will organize a session during the 100% Actuaires day, in Paris, based on a “pricing game“. We provide two datasets, (motor insurance, third party claims), with 2 years of experience, and 100,000 policies. Each ‘team’ has to submit premium proposal for 36,000 potential ... [Read more...]

« 1 … 4 5 6 7 8 … 19 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by arthur charpentier

Clusters of (French) Regions

Simple Distributions for Mixtures?

Confidence Regions for Parameters in the Simplex

Regression with Splines: Should we care about Non-Significant Components?

How Could Classification Trees Be So Fast on Categorical Variables?

Inter-relationships in a matrix

Additional thoughts about ‘Lorenz curves’ to compare models

Profile Likelihood

Variable Importance with Correlated Features

Applications of Chi-Square Tests

Statistical Tests: Asymptotic, Exact, ou based on Simulations?

Tests, Power and Significance

Visualising a Circular Density

Playing with Leaflet (and Radar locations)

Computational Time of Predictive Models

Convergence and Asymptotic Results

Minimalist Maps

On NCDF Climate Datasets

“A 99% TVaR is generally a 99.6% VaR”

Pricing Game

Articles by arthur charpentier

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)