Articles by Allan Engelhardt

Feature selection: Using the caret package

November 16, 2010 | Allan Engelhardt

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) ...
[Read more...]

Feature selection: Using the caret package

November 16, 2010 | Allan Engelhardt

Feature selection is an important step for practical commercial data mining which is often characterised by data sets with far too many variables for model building. In a previous post we looked at all-relevant feature selection using the Boruta package while in this post we consider the same (artificial, toy) ... [Read more...]

Big data for R

August 5, 2010 | Allan Engelhardt

Revolutions Analytics recently announced their "big data" solution for R. This is great news and a lovely piece of work by the team at Revolutions. However, if you want to replicate their analysis in standard R, then you can absolutely do so and we show you how. [Read more...]

Area Plots with Intensity Coloring

July 13, 2010 | Allan Engelhardt

I am not sure apeescape’s ggplot2 area plot with intensity colouring is really the best way of presenting the information, but it had me intrigued enough to replicate it using base R graphics. The key technique is to draw a gradient line which R does not support natively so ... [Read more...]

Faster R through better BLAS

June 15, 2010 | Allan Engelhardt

Can we make our analysis using the R statistical computing and analysis platform run faster? Usually the answer is yes, and the best way is to improve your algorithm and variable selection. But recently David Smith was suggesting that a big benefit of their (commercial) version of R was that ... [Read more...]

Beautiful Data

July 27, 2009 | Allan Engelhardt

O'Reilly's recent publication Beautiful Data has a chapter by Jeff Jonas which is enough reason in itself for me to recommend it. The chapter, Data Finds Data, is also available as a PDF download. [Read more...]

Massively parallel database for analytics

July 22, 2009 | Allan Engelhardt

This is by far the best description of why traditional parallel databases (like Teradata, Greenplum et al.) is a evolutionary dead end. But much more than a theoretical discussion, they have built a solution which they call HadoopDB. It is based on Hadoop, PostgreSQL, and Hive and is completely Open ...
[Read more...]

The Knapsack Problem

July 10, 2009 | Allan Engelhardt

David posts a question about how to solve this knapsack problem using the R statistical computing and analysis platform. My reply in the comments seems to have disappeared for a while so here is my proposed solution: [Read more...]

OECD Statistics

July 2, 2009 | Allan Engelhardt

I am a sucker for good quality data. I wrote about data.gov, the US Government data site before, and now I find OECD Statistics which has some 300 data sets, many of which seems to be readily accessible (though some may require subscription)
[Read more...]

R tips: Installing Rmpi on Fedora Linux

June 12, 2009 | Allan Engelhardt

Somebody on the R-help mailing list asked how to get Rmpi working on his Fedora Linux machine so he could do high-performance computing on a cluster of machines (or a single multicore machine) using the R statistical computing and analysis platform. Since it is unusually painful to get working, I ... [Read more...]

Data Mashups in R from O’Reilly

June 9, 2009 | Allan Engelhardt

O’Reilly has published Data Mashups in R as a $4.99 PDF download in their Short Cut series. In 27 pages it takes you through an example of how to combine foreclosure information with maps and geographical information to produce plots like the one here. This is all done with the R ... [Read more...]
1 2 3

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)