Articles by arthur charpentier

Forecast, Automatic Routines vs. Experience

March 18, 2015 | arthur charpentier

This morning, in our Time Series course, we’ve been playing with some data I got from google.ca/trends/. Actually, we’ve been playing on some old version, downloaded 18 months ago (discussed in a previous post, in French). __ urls = "http://freakonometrics.free.fr/report-headphones-2015.csv" __ report=read.table( + urls,... [Read more...]

Growing some Trees

March 18, 2015 | arthur charpentier

Consider here the dataset used in a previous post, about visualising a classification (with more than 2 features), __ MYOCARDE=read.table( + "http://freakonometrics.free.fr/saporta.csv", + header=TRUE,sep=";") The default classification tree is __ arbre = rpart(factor(PRONO)~.,data=MYOCARDE) __ rpart.plot(arbre,type=4,extra=6) We can change the options ... [Read more...]

Some More Results on the Theory of Statistical Learning

March 8, 2015 | arthur charpentier

Yesterday, I did mention a popular graph discussed when studying theoretical foundations of statistical learning. But there is usually another one, which is the following, Let us get back to the underlying formulas. On the traning sample, we have some empirical risk, defined as for some loss function . Why is ... [Read more...]

Some Intuition About the Theory of Statistical Learning

March 7, 2015 | arthur charpentier

While I was working on the Theory of Statistical Learning, and the concept of consistency, I found the following popular graph (e.g. from thoses slides, here in French) The curve below is the error on the training sample, as a function of the size of the training sample. Above, ... [Read more...]

Visualising a Classification in High Dimension

March 6, 2015 | arthur charpentier

So far, when discussing classification, we’ve been playing on my toy-dataset (actually, I should no claim it’s mine, it is inspired by the one used in the introduction of Boosting, by Robert Schapire and Yoav Freund). But in ral life, there are more observations, and more explanatory variables.... [Read more...]

Supervised Classification, beyond the logistic

March 5, 2015 | arthur charpentier

In our data-science class, after discussing limitations of the logistic regression, e.g. the fact that the decision boundary line was a straight line, we’ve mentioned possible natural extensions. Let us consider our (now) standard dataset clr1 [Read more...]

Supervised Classification, discriminant analysis

March 3, 2015 | arthur charpentier

Another popular technique for classification (or at least, which used to be popular) is the (linear) discriminant analysis, introduced by Ronald Fisher in 1936. Consider the same dataset as in our previous post __ clr1 x y z df plot(x,y,pch=19,cex=2,col=clr1[z+1]) The main interest of that ... [Read more...]

Supervised Classification, Logistic and Multinomial

March 2, 2015 | arthur charpentier

We will start, in our Data Science course, to discuss classification techniques (in the context of supervised models). Consider the following case, with 10 points, and two classes (red and blue) __ clr1 clr2 x y z df plot(x,y,pch=19,cex=2,col=clr1[z+1]) To get a prediction, i.e. ... [Read more...]

John Snow, and Google Maps

February 27, 2015 | arthur charpentier

In my previous post, I discussed how to use OpenStreetMaps (and standard plotting functions of R) to visualize John Snow’s dataset. But it is also possible to use Google Maps (and ggplot2 types of graphs). library(ggmap) get_london [Read more...]

John Snow, and OpenStreetMap

February 27, 2015 | arthur charpentier

While I was working for a training on data visualization, I wanted to get a nice visual for John Snow’s cholera dataset. This dataset can actually be found in a great package of famous historical datasets. library(HistData) data(Snow.deaths) data(Snow.streets) One can easily visualize the ... [Read more...]

Visualizing Clusters

February 24, 2015 | arthur charpentier

Consider the following dataset, with (only) ten points x=c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85) y=c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3) plot(x,y,pch=19,cex=2) We want to get – say – two clusters. Or more specifically, two sets of observations, each of them sharing some similarities. Since the number of observations is rather small, it is actually possible to ... [Read more...]

k-means clustering and Voronoi sets

February 22, 2015 | arthur charpentier

In the context of -means, we want to partition the space of our observations into classes. each observation belongs to the cluster with the nearest mean. Here “nearest” is in the sense of some norm, usually the (Euclidean) norm. Consider the case where we have 2 classes. The means being respectively ... [Read more...]

Inequalities and Quantile Regression

February 6, 2015 | arthur charpentier

In the course on inequality measure, we've seen how to compute various (standard) inequality indices, based on some sample of incomes (that can be binned, in various categories). On Thursday, we discussed the fact that incomes can be related to different variables (e.g. experience), and that comparing income inequalities ... [Read more...]

Modeling Incomes and Inequalities

January 17, 2015 | arthur charpentier

Last week, in our Inequality course, we've been looking at data. We started with some simulated data, only a few of them __ library("ineq") __ load(url("http://freakonometrics.free.fr/income_5.RData")) __ (income=sort(income)) [1] 19233 23707 53297 61667 218662 How could we say that there is inequality in this sample? If we look at ... [Read more...]

Automatic Detection of the Language of a Tweet

January 5, 2015 | arthur charpentier

Two days ago, in my post to extract automatically my own tweets, and to generate some html list, I mentioned that it would be great if there were a function that could be used to distinguish tweets in English, and tweets in French (usually, I tweet in one of those ... [Read more...]

An automatic code to extract tweets (and to produce the “Somewhere else” review)

January 3, 2015 | arthur charpentier

A few weeks ago, I ask in a post the (simple) question "dear reader, who are you?" just to know more about the readers of my blog. I found that extremely interesting (even if - to be honest - I was expecting more answers to start a more serious sociological ... [Read more...]

Names in the U.S., from James Smith to Jose Rodriguez

December 7, 2014 | arthur charpentier

Two weeks ago, @mona published an interesting post on her blog, about a difficult question, What’s The Most Common Name In America? There were stats about first names, in the U.S., and last names, too. Those informations are - somehow - easy to get. But usually, it is ... [Read more...]

Subjective Ways of Cutting a Continuous Variables

December 2, 2014 | arthur charpentier

You have probably seen @coulmont's maps. If you haven't, you should probably go and spend some time on his blog (but please, come back afterwards, I still my story to tell you). Consider for instance the maps we obtained for a post published in Monkey Cage, a few months ago, ... [Read more...]

Confidence vs. Credibility Intervals

November 26, 2014 | arthur charpentier

Tomorrow, for the final lecture of the Mathematical Statistics course, I will try to illustrate - using Monte Carlo simulations - the difference between classical statistics, and the Bayesien approach. The (simple) way I see it is the following, for frequentists, a probability is a measure of the the frequency ... [Read more...]

Reinterpreting Lee-Carter Mortality Model

November 18, 2014 | arthur charpentier

Last week, while I was giving my crash course on R for insurance, we’ve been discussing possible extensions of Lee & Carter (1992) model. If we look at the seminal paper, the model is defined as follows Hence, it means that This would be a (non)linear model on the logarithm ... [Read more...]

« 1 … 6 7 8 9 10 … 19 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by arthur charpentier

Forecast, Automatic Routines vs. Experience

Growing some Trees

Some More Results on the Theory of Statistical Learning

Some Intuition About the Theory of Statistical Learning

Visualising a Classification in High Dimension

Supervised Classification, beyond the logistic

Supervised Classification, discriminant analysis

Supervised Classification, Logistic and Multinomial

John Snow, and Google Maps

John Snow, and OpenStreetMap

Visualizing Clusters

k-means clustering and Voronoi sets

Inequalities and Quantile Regression

Modeling Incomes and Inequalities

Automatic Detection of the Language of a Tweet

An automatic code to extract tweets (and to produce the “Somewhere else” review)

Names in the U.S., from James Smith to Jose Rodriguez

Subjective Ways of Cutting a Continuous Variables

Confidence vs. Credibility Intervals

Reinterpreting Lee-Carter Mortality Model

Articles by arthur charpentier

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)