Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
“…[W]e are suspicious of rapid cognition. We live in a world that assumes that the quality of a decision is directly related to the time and effort that went into making it” Malcom Gladwell, Blink
In his book Blink, Malcolm Gladwell summarises a common misconception about good decision making. According to folk wisdom, the more time, information, and effort you put into a decision, the better it gets. In other words, “More is better.” If you are a doctor making a diagnosis, more medical tests are always better. If you are trying to decide if your new Tinder match is worthy of a date, try to find them on Facebook, Instagram and Snapchat first. If you deciding how to invest in the stock market, get as much data as you can and build a statistical model so complex that it describes the past perfectly.
However, decades of research in cognitive science and machine learning have shown that the “More is better” theory is, in many real-world decisions, flat wrong. In contrast, there are many cases where, as Dr. Gerd Gigerenzer has put it, “Less is more.” Why? For two key reasons. The first reason is statistical: Complex decision models with many parameters can lead to overfitting. In essence, overfitting occurs when a model is very good at describing one specific past dataset, but fails in predicting new, unseen data (Gigerenzer & Brighton, 2009). For example, a complex economic model might describe changes in past stock prices very well, but is largely unable to predict future changes. This is why, as Burton Malkiel shows, it is so hard for complex trading algorithms to outperform simple index funds in the stock market (Malkiel, 1999). The second reason is psychological: Even if a complex decision model is good at predicting new data, if a person can’t understand it, or easily implement it in a real-world decision, like a doctor trying to use logistic regression in an emergency room, they won’t use it.
What simple decision rules can people use to make good decisions? One popular class of simple decision rules are Fast and Frugal Trees (FFTs, Gigerenzer & Todd, 1999). Fast and frugal trees make very fast decisions based on a few (usually 1 to 5) pieces of information and ignore all other information. In other words, Fast and frugal trees are noncompensatory, meaning that once they make a decision based on a few pieces of information, no additional information can ever change the decision. Because they are so simple to use, they have been used in many real-world decision tasks from making coronary artery disease diagnoses (Green & Mehr, 1997), to detecting depression (Jenny, Pachur, Williams, Becker & Margraf, 2013). However, lest you think that fast and frugal trees are only useful when time is limited, research has shown that fast and frugal trees can out-predict more complex models in decidedly non-human simulations (Gigerenzer, Czerlinski & Martignon, 1999).
While fast and frugal trees have shown promise, there are currently no off-the-shelf methods to create them. How can you create your own fast and frugal decision trees for your own dataset? Starting today, you can use the FFTrees R package available on CRAN. The main function in the package is fft(), which takes a standard formula and data argument, and returns a fast and frugal tree (fft) object. From this object, you can view its underlying trees, along with many standard classification statistics (e.g.; hit-rate, false alarm rate, AUC) applied to both training and test (i.e.; prediction) datasets. Finally, the function has two alternative classification algorithms, logistic regression and CART, built in, so you can always compare the accuracy of your fast and frugal trees to two gold-standards in the classification literature. If you’re like me, you’ll be amazed at how well simple, transparent fast and frugal trees perform relative to these gold-standards, especially in predicting new data!
The FFTrees package in action
You can install and load the FFTrees package from CRAN:
install.packages("FFTrees") library("FFTrees")
Once you’ve installed the package, you can view the overview vignette by running the code FFTrees.guide(). However, for this blog post I’ll show you how to create fast and frugal trees for predicting breast cancer. The data we’ll use comes from the Wisconsin Breast Cancer Database (data source). The data is stored as a dataframe with 699 rows, representing 699 patients, and 10 columns. The 10 columns represent 9 physiological measurements, from cell sizes to cell shapes, and 1 binary variable (diagnosis) indicating whether the woman truly does, or does not have breast cancer. Here is how the first few rows of the dataframe look:
thickness | cellsize.unif | cellshape.unif | adhesion | epithelial | nuclei.bare | chromatin | nucleoli | mitoses | diagnosis |
---|---|---|---|---|---|---|---|---|---|
5 | 1 | 1 | 1 | 2 | 1 | 3 | 1 | 1 | FALSE |
5 | 4 | 4 | 5 | 7 | 10 | 3 | 2 | 1 | FALSE |
3 | 1 | 1 | 1 | 2 | 2 | 3 | 1 | 1 | FALSE |
6 | 8 | 8 | 1 | 3 | 4 | 3 | 7 | 1 | FALSE |
4 | 1 | 1 | 3 | 2 | 1 | 3 | 1 | 1 | FALSE |
8 | 10 | 10 | 8 | 7 | 10 | 9 | 7 | 1 | TRUE |
To create a fast and frugal tree from the dataset, we’ll use the fft() function, entering formula = diagnosis ~., meaning that we want to predict diagnosis as a function of (potentially), all other variables, and data = breastcancer. We’ll assign the result to a new object of class fft called breastcancer.fft
breastcancer.fft <- fft(formula = diagnosis ~., data = breastcancer)
Now that we’ve created the object, we can print it to the console to get basic information
breastcancer.fft # "An fft object containing 6 trees using 4 cues {cellsize.unif,cellshape.unif,nuclei.bare,epithelial} out of an original 9" # "Data were trained on 683 exemplars. There were no test data" # "FFT AUC: (Train = 0.98, Test = NA)" # "My favorite tree is #3 [Training: HR = 0.93, FAR = 0.05], [Testing: HR = NA, FAR = NA]"
The printout tells us that the final fft object contains 6 different trees, and the largest tree only uses 4 of the original 9 cues. To see the best tree, we can simply plot the fft object:
plot(breastcancer.fft, main = "Breastcancer FFT", decision.names = c("Healthy", "Cancer"))
There’s one of our fast and frugal trees! In the top section of the plot, we see that the data had 444 true healthy cases, and 239 true cancer cases. In the middle section, we see the actual tree. The tree then starts by looking at the cue cellsize.u. If the value is less than 3, the tree decides that the person is healthy. If the value is not less than 3, then the tree looks at cellshape. If the cellshape. <= 2, the tree decides the patient is healthy. If cellshape. > 2, the tree decides that the person does have cancer. That’s the whole decision algorithm! Now isn’t that a lot easier to interpret than something like logistic regression? Again, imagine giving this logistic regression equation to anyone without a degree in statistics.
Performance
Ok, so our fast and frugal tree easy to understand and use, but how well does it perform? The bottom section of the above plot shows a series of performance statistics. On the bottom left hand corner, we can see a classification table, which shows how the tree’s decisions compare to the truth. Entries on the main diagonal (Cor Rej and Hit) correspond to correct decisions, while the other entries correspond to incorrect decisions. As you can see, the tree performed exceptionally well: it made correct diagnoses in 646 (424 + 222) out of all 683 cases (95% correct). Additional performance statistics, including specificity (1 – false alarm rate), hit rate, d-prime, AUC (area under the curve) are also displayed. Finally, in the bottom right plot, you can see an ROC curve which compares the performance of the trees to CART (in red) and logistic regression (in blue).
Viewing other trees
While this tree did well, it still made some errors in both detecting true cancer cases (i.e.; hit rate) and in correctly rejecting true healthy cases (i.e.; specificity). Now, what if you want a tree that rarely misses true cases, at the cost of additional false alarms? As Luan, Schooler, & Gigerenzer (2011) have shown, you can easily shift the balance of errors in a fast and frugal tree by adjusting the decisions it makes at each level of a tree. The fft function automatically builds several versions of the same general tree that make different error trade-offs. We can see the performance of each of these trees in the bottom-right ROC curve. Looking at the ROC curve, we can see that tree number 5 has a very high hit-rate, but a smaller specificity. We can look at this tree by adding the tree = 5 argument to plot():
plot(breastcancer.fft, main = "Breastcancer FFT", decision.names = c("Healthy", "Cancer"), tree = 5)
Here is the resulting tree. As you can see, this tree uses an extra cue called nuclei.bar. This tree has a perfect hit rate of 100% (just as we wanted), but at a cost of a lower specificity of 80%.
Additional arguments
– Cross-validation: The fft() function allows you to easily create trees from a training dataset, and test the performance of the trees with a test dataset (aka, cross-validation). You can do this by either entering an explicit test dataset in the data.test argument, or by randomly splitting the main dataset into a separate training and testing sample with train.p. For example, train.p = .5 will randomly split the data into a 50% training set, which will be used to build the trees, and a 50% test set, which will be used to evaluate their prediction performance.
# Create a 50% training and 50% testing dataset with train.p = .5 breastcancer.test.fft <- fft(formula = diagnosis ~ , data = breastcancer, train.p = .5)
– Restricting trees: If you want to explicitly decide which cues you want in the tree, you can specify this in the formula argument. For example, the following code will generate a tree from the breast cancer data, but only using cues thickness, mitosis, and adhesion.
# Only use 3 cues in the trees breastcancer.r.fft <- fft(formula = diagnosis ~ thickness + mitoses + adhesion, data = breastcancer)
Summary
The FFTrees package contains lots of other functions for visualising and comparing trees. To see all the details, be sure to check out the package vignettes either in the package or on CRAN (here). For all you judgment and decision making researchers out there, I will also be presenting the package at the annual meeting of the Society for Judgment and Decision Making (SJDM) in Boston in November 2016
The package is also very much in development, so I am grateful for any recommendations, bug-reports, or criticisms. You can post bug-reports at www.github.com/ndphillips/FFTrees/Issues, or email me directly at Nathaniel.D.Phillips.is@gmail.com
Coauthors
This package was developed in collaboration with Dr. Hansjoerg Neth and Dr. Wolfgang Gaissmaier at the University of Konstanz, and Dr. Jan Woike at the Max Planck Institute for Human Development in Berlin.
References
Gigerenzer, G., & Brighton, H. (2009). Homo heuristicus: Why biased minds make better inferences. Topics in Cognitive Science, 1(1), 107–143.
Gigerenzer, G., & Todd, P. M. (1999). Fast and frugal heuristics: The adaptive toolbox. In Simple heuristics that make us smart (pp. 3–34). Oxford University Press.
Gigerenzer, G., Czerlinski, J., & Martignon, L. (1999). How good are fast and frugal heuristics? In Decision science and technology (pp. 81–103). Springer.
Gladwell, M. (2007). Blink: The power of thinking without thinking. Back Bay Books.
Green, L., & Mehr, D. R. (1997). What alters physicians’ decisions to admit to the coronary care unit? Journal of Family Practice, 45(3), 219–226.
Jenny, M. A., Pachur, T., Williams, S. L., Becker, E., & Margraf, J. (2013). Simple rules for detecting depression. Journal of Applied Research in Memory and Cognition, 2(3), 149–157.
Malkiel, B. G. (1999). A random walk down Wall Street: including a life-cycle guide to personal investing. WW Norton & Company.
Marewski, J. N., & Gigerenzer, G. (2012). Heuristic decision making in medicine. Dialogues Clin Neurosci, 14(1), 77–89.
Luan, S., Schooler, L. J., & Gigerenzer, G. (2011). A signal-detection analysis of fast-and-frugal trees. Psychological Review, 118(2), 316.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.