An R Users Guide to JSM 2019
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
If you are like me, and rather last minute about making a plan to get the most out of a large conference, you are just starting to think about JSM 2019 which will begin in just a few days. My plans always begin with an attempt to sleuth out the R-related sessions. While in the past it took quite a bit of work to identify talks that were likely backed by R-based calculations, this is clearly no longer the case. In fact, because Stanford Professor Trevor Hastie will be delivering the prestigious Wald Lectures this year, R-backed work will be front and center.
Professor Hastie has made numerous, important contributions to statistical learning, machine learning, data science and statistical computing. Among the latter, is the glmnet
package he co-authored with Jerome Friedman, Rob Tibshirani, Noah Simon, Balasubramanian Narasimhan and Junyang Qian which has become a fundamental resource.
The Wald Lectures will be delivered over three days in room CC Four Seasons 1 according to the following schedule:
* Lecture 1: Mon, 7/29/2019, 10:30 AM – 12:20 PM
* Lecture 2: Tue, 7/30/2019, 2:00 PM – 3:50 PM
* Lecture 3: Wed, 7/31/2019, 10:30 AM – 12:20 PM
If you want to do some preparation for the lectures, you might have a look at the book Statistical Learnig with Sparsity; The Lasso and Generalizations by Hastie, Tibshirani and Wainwright.
The rest of this post lists some R-related talks that can help you fill your days at JSM! I am sure my list is not complete. Please feel free to add anything I may have missed to the comments section following this post.
Sunday, July 28, 2019
Findings from Analysis and Visualization of the New York City Housing and Vacancy Survey Data – CC 501 – 3:20 PM – Nels Grevstad, Metropolitan State University of Denver; Rachel Rosebrook, Metropolitan State University of Denver; Lance Barto, Metropolitan State University of Denver; Gil Leibovich, Metropolitan State University of Denver; Elizabeth Foster, Metropolitan State University of Denver; ThienNgo Le, Metropolitan State University of Denver; Kelsey Smith, Metropolitan State University of Denver; Nathanael Whitney, Metropolitan State University of Denver; Zoe Girkin, Metropolitan State University of Denver; Ahern Nelson, Metropolitan State University of Denver; Karan Bhargava, Metropolitan State University of Denver; Alex Whalen-Wagner, Metropolitan State University of Denver; Gemma Hoeppner, Metropolitan State University of Denver; Larry Breeden, Metropolitan State University of Denver; Ayako Zrust, Metropolitan State University of Denver; Travis Rebhan, Metropolitan State University of Denver; Anayeli Ochoa, Metropolitan State University of Denver
Bayesian Uncertainty Estimation Under Complex Sampling – Speed: CC 502 – 3:00 PM – Matthew Williams, National Science Foundation; Terrance Savitsky, Bureau of Labor Statistics
Measuring Gentrification Over Time with the NYCHVS – Poster: CC Hall C – 4:00 PM – 4:45 PM Robert Montgomery, NORC; Quentin Brummet, NORC; Nola du Toit, NORC at the University of Chicago; Peter Herman, NORC at the University of Chicago; Edward Mulrow, NORC at the University of Chicago
A SHINY Markov Machine for Decision-Making in Major League Baseball – Part 1: CC105 – 2:45 PM and Part 2: CC Hall C – 4:00 PM to 4:45 PM Jason Osborne, North Carolina State University
Measuring Gentrification Over Time with the NYCHVS – CC 501 – 2:55 PM – Robert Montgomery, NORC; Quentin Brummet, NORC; Nola du Toit, NORC at the University of Chicago; Peter Herman, NORC at the University of Chicago; Edward Mulrow, NORC at the University of Chicago
A New Tidy Data Structure to Support Exploration and Modeling of Temporal Data – CC 301 – 3:25 PM – Earo Wang, Monash University; Dianne Cook, Monash University; Rob J Hyndman, Monash Univeristy
TensorFlow Versus H20, Predicting the SandP500 – CC 504 – 4:50 PM – Kenneth Davis
Model-Based Clustering Using Adjacent-Categories Logit Models via Finite Mixture Model – CC 504 – 5:05 PM – Lingyu Li, Victoria University of Wellington; Ivy Liu, Victoria University of Wellington; Richard Arnold, Victoria University of Wellington
The Estimable Luke Tierney – and Estimability in R – CC 501 – 5:20 PM – Russell V. Lenth, University of Iowa
Monday, July 29, 2019
Training Students Concurrently in Data Science and Team Science: Results and Lessons Learned from Multi-Institutional Interdisciplinary Student-Led Research Teams 2012-2018 -Poster: CC Hall C- 2:00 PM to 3:50 PM – Brent Ladd, Purdue University; Mark Ward, Purdue University
A Natural Language Processing Algorithm for Medication Extraction from Electronic Health Records Using the R Programming Language: MedExtractR – Pister: CC Hall C – 2:00 PM to 3:50 PM – Hannah L Weeks, Vanderbilt University; Cole Beck, Vanderbilt University Medical Center; Elizabeth McNeer, Vanderbilt University; Joshua C Denny, Vanderbilt University; Cosmin A Bejan, Vanderbilt University; Leena Choi, Vanderbilt University Medical Center
Conditional Probability and SQL for Data Science – Poster: CC Hall C – 10:30 AM to 12:20 PM – Eric Suess, CSU East Bay
R Markdown: a Software Ecosystem for Reproducible Publications – CC 107- 11:55 PM – Yihui Xie, RStudio, Inc.
Infusing Bayesian Strategies for Pharmaceutical Manufacturing and Development – CC 109- 12:05 PM – Bill Pikounis, Johnson & Johnson; Dwaine Banton, Janssen R&D; John Oleynick, Johnson & Johnson; Jyh-Ming Shoung, Janssen R&D
Tuesday, July 30, 2019
Controlling the False Discovery Proportion: a Simulation Study – Poster: CC Hall C – 10:30 AM to 12:20 PM HARLAN MCCAFFERY, University of Michigan; Chi Chang, Michigan State University
Give Your Statistician Colleague Iris Bulbs for Their House Warming! – CC 605 – 11:05 AM – Dianne Cook, Monash University
From Prediction Models to Shiny App: Creating a Tool for Contaminated Food Source Prediction in Salmonella and STEC Outbreaks – CC Hall C – 11:35 AM to 12:20 PM – Caroline Ledbetter, University of Colorado; Alice White, Colorado School of Public Health; Elaine Scallan Walter, Colorado School of Public Health; David Weitzenkamp, Colorado School of Public Health
Stats for Data Science – H-Centennial Ballroom G-H – Round Table: 12:30 PM to 1:50 PM – Daniel Kaplan, Macalester College
Experiences with Incorporating R into a Second-Level Biostatistics Course for MPH Students – CC Hall C – 2:00 PM to 2:45 PM – Christine Mauro, Columbia University; Nicholas Williams, Columbia University; Anjile An, Columbia University
From Prediction Models to Shiny App: Creating a Tool for Contaminated Food Source Prediction in Salmonella and STEC Outbreaks – CC 501 – 8:40 AM Caroline Ledbetter, University of Colorado; Alice White, Colorado School of Public Health; Elaine Scallan Walter, Colorado School of Public Health; David Weitzenkamp, Colorado School of Public Health
Tools for Evaluating Quality of State and Local Administrative Data – CC708 – 9:15AM – Zachary H Seeskin, NORC at the University of Chicago; Gabriel Ugarte, NORC at the University of Chicago; Rupa Datta, NORC at the University of Chicago
Wednesday, July 31, 2019
Ggvoronoi: Voronoi Tessellations in R – CC 105 – 11:20 AM -Thomas J Fisher, Miami University; Robert C Garrett, Miami University; Karsten Maurer, Miami University
Using R to Conduct Retrospective Analyzes of EHR and Imaging Data: a Case Study in MS – Poster: CC Hall C – 10:30 AM – 12:20 PM – Melissa Martin, University of Pennsylvania; Russell Shinohara, University of Pennsylvania
Generalized Causal Mediation and Path Analysis and Its R Package gmediation
Talk: – CC 501 – 8:45 AM – and Poster: CC Hall C – 11:35 AM – 12:20 PM –
Jang Ik Cho, Eli Lilly and Company; Jeffrey M Albert, Case Western Reserve University
Tidi_MIBI: a Tidy Pipeline for Microbiome Analysis and Visualization in R – Speed Talk: CC 501 – 10:15 AM and Poster: CC Hall C – 11:35 AM – 12:20 PM – Charlie Carpenter, University of Colorado-Biostatistics
Incorporating Spatial Statistics into Routine Analysis of Agricultural Field Trials – CC Hall C – 11:35 AM – 12:20 PM – Julia Piaskowski, University of Idaho; Chad Jackson, University of Idaho; Juliet Marshall, University of Idaho; William J Price, University of Idaho
Incorporating Spatial Statistics into Routine Analysis of Agricultural Field Trials – CC 501 – 10:05 AM – Julia Piaskowski, University of Idaho; Chad Jackson, University of Idaho; Juliet Marshall, University of Idaho; William J Price, University of Idaho
DemoR: Tools for Teaching and Presenting R Code – CC 302 – 10:35 AM – Kelly Bodwin, California Polytechnic State University; Hunter Glanz, California Polytechnic State University
Ghclass: An R Package for Managing Classes with GitHub – CC 302 – 10:50 AM – Colin Rundel, Duke University
Using and Building Shiny Apps for Teaching Introductory Biostatistics CC 504 – 11:05 AM – Adam Ciarleglio, The George Washington University
Using GitHub and RStudio to Facilitate Authentic Learning Experiences in a Regression Analysis Course – CC 302 – 11:05 AM – Maria Tackett, Duke University
A Generalized Additive Cox Model with L1-Penalty for Heart Failure Time-To-Event Outcomes and Comparison to Other Machine Learning Approaches – CC 712 – 3:20 PM – Matthias Kormaksson
Thursday, August 1, 2019
A Journey Teaching Applied Statistics for Health Sciences in an Asynchronous Team Based Learning Format Using Data Science Ideas – CC 110 – 8:50 AM – Ben Barnard
Noncentral Algorithm Assessments – CC 104 – 9:20 AM Jerry Lewis, Biogen Idec
Supplementary Code
In case you are wondering how I produced the plot above, here is the code which uses the cranly
and dlstats
packages to investigate CRAN.
library(tidyverse) library(cranly) library(dlstats) # Get clean copy of CRAN p_db <- tools::CRAN_package_db() package_db <- clean_CRAN_db(p_db) # Build package network package_network <- build_network(package_db) # Find Hastie packages pkgs <- package_by(package_network, "Trevor Hastie") # Find most downloaded Hastie packages dstats <- cran_stats(pkgs) topdown <- group_by(dstats,package) %>% summarize(n=sum(downloads)) %>% arrange(desc(n)) %>% filter(n > 100000) # Plot the monthly downloads for Hastie's top 5 packages shortlist <- select(topdown,package) %>% slice(1:5) toppkgs <- cran_stats(as.vector(shortlist$package)) ggplot(toppkgs, aes(end, downloads, group=package, color=package)) + geom_line() + geom_point(aes(shape=package)) + xlab("Monthly Downloads") + ggtitle("Trevor Hastie Packages")
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.