Articles by arthur charpentier

Computing AIC on a Validation Sample

July 29, 2015 | arthur charpentier

This afternoon, we’ve seen in the training on data science that it was possible to use AIC criteria for model selection. __ library(splines) __ AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="log"))) [1] 438.6314 __ AIC(glm(dist ~ speed, data=train_cars, family=poisson(link="identity"))) [1] 436.3997 __ AIC(glm(dist ~ bs(...
[Read more...]

Choosing a Classifier

July 21, 2015 | arthur charpentier

In order to illustrate the problem of chosing a classification model consider some simulated data, __ n = 500 __ set.seed(1) __ X = rnorm(n) __ ma = 10-(X+1.5)^2*2 __ mb = -10+(X-1.5)^2*2 __ M = cbind(ma,mb) __ set.seed(1) __ Z = sample(1:2,size=n,replace=TRUE) __ Y = ma*(Z==1)+mb*(Z==2)+rnorm(n)*5 __ df = data.frame(Z=...
[Read more...]

An Update on Boosting with Splines

July 2, 2015 | arthur charpentier

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using __ library(splines) __ fit=lm(y~bs(x,degree=1,df=3),data=df) The problem ... [Read more...]

An Attempt to Understand Boosting Algorithm(s)

June 26, 2015 | arthur charpentier

Tuesday, at the annual meeting of the French Economic Association, I was having lunch Alfred, and while we were chatting about modeling issues (econometric models against machine learning prediction), he asked me what boosting was. Since I could not be very specific, we’ve been looking at wikipedia page. Boosting ... [Read more...]

‘Variable Importance Plot’ and Variable Selection

June 17, 2015 | arthur charpentier

Classification trees are nice. They provide an interesting alternative to a logistic regression.  I started to include them in my courses maybe 7 or 8 years ago. The question is nice (how to get an optimal partition), the algorithmic procedure is nice (the trick of splitting according to one variable, and only ...
[Read more...]

p-hacking, or cheating on a p-value

June 11, 2015 | arthur charpentier

Yesterday evening, I discovered some interesting slides on False-Positives, p-Hacking, Statistical Power, and Evidential Value, via @UCBITSS ‘s post on Twitter. More precisely, there was this slide on how cheating (because that’s basically what it is) to get a ‘good’ model (by targeting the p-value) As mentioned by @david_...
[Read more...]

Data Science: from Small to Big Data

May 29, 2015 | arthur charpentier

This Tuesday, I will be in Leuven (in Belgium) at the ACP meeting to give  a talk on Data Science: from Small to Big Data. The talk will take place in the Faculty Club from 6 till 8 pm. Slides could be found online (with animated pictures). As usual, comments are welcome.
[Read more...]

Copulas and Financial Time Series

May 12, 2015 | arthur charpentier

I was recently asked to write a survey on copulas for financial time series. The paper is, so far, unfortunately, in French, and is available on https://hal.archives-ouvertes.fr/. There is a description of various models, including some graphs and statistical outputs, obtained from read data. To illustrate, I’... [Read more...]

Another Interactive Map for the Cholera Dataset

March 31, 2015 | arthur charpentier

Following my previous post, François (aka @FrancoisKeck) posted a comment mentionning another package I could use to get an interactive map, the rleafmap package. And the heatmap was here easy to include. This time, we do not use openstreetmap. The first part is still the same, to get the ... [Read more...]

Interactive Maps for John Snow’s Cholera Data

March 28, 2015 | arthur charpentier

This week, in Istanbul, for the second training on data science, we’ve been discussing classification and regression models, but also visualisation. Including maps. And we did have a brief introduction to the  leaflet package, devtools::install_github("rstudio/leaflet") require(leaflet) To see what can be done with that ... [Read more...]

Spliting a Node in a Tree

March 23, 2015 | arthur charpentier

If we grow a tree with standard functions in R, on the same dataset used to introduce classification tree in some previous post, __ MYOCARDE=read.table( + "http://freakonometrics.free.fr/saporta.csv", + head=TRUE,sep=";") __ library(rpart) __ cart library(rpart.plot) __ library(rattle) __ prp(cart,type=2,extra=1) The first step ... [Read more...]
1 5 6 7 8 9 19

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)