Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
When I was an undergrad, a professor suggested I learn this statistical programming language called R.
I took one look at the interface, panicked, and left.
A lot has changed in the R world since then, not the least of which was the release of the RStudio integrated development environment. While the universe of R packages continues to grow, and the work can now be done from the comfort of RStudio, the fact remains: learning R means learning to code R.
Many of my students have never coded before, although this is a half-truth: they’ve probably used Excel, which requires a decent amount of functions and references. What Excel doesn’t require, though, is naming and manipulating variables.
R is an ideal choice for first-time data coders: the familiar tabular data frame is a core structure. Operations are designed with data analysis in mind: after all, R is a statistical programming language. (In my opinion, this makes it preferred to Python, which was designed as a general-purpose scripting language — again, as far as learning to code as a data analyst goes.)
I assume no prior coding language for this workshop. My goals are to equip students to work comfortably from the RStudio environment, ingest and explore data, and make simple graphical representations of data. In particular, students will perform the most common tabular data cleaning and exploration tasks using the dplyr
library.
Above all these objectives, however, is my goal to help students not panic over learning R, like I did when I started.
1: Welcome to the R Project
Objective: Student can install and load an R package
Description:
- What is R and when would I use it?
- R plus RStudio
- Installing and loading packages
Exercise: Install a CRAN task view
Assets needed: None
Time: 35 minutes
Lesson 2: Introduction to RStudio
Objective: Student can navigate the RStudio integrated development environment
Description:
- Basic arithmetic and comparison operations
- Saving, closing and loading scripts
- Opening help documentation
- Plotting graphs
- Assigning objects
Exercises: Practice assigning and removing objects
Assets needed: None
Time: 40 minutes
Lesson 3: Working with vectors
Objective: Student can create, inspect and modify vectors
Description:
- Creating vectors
- Vector operations
- Indexing elements of a vector
Exercises: Drills
Assets needed: None
Time: 35 minutes
Lesson 4: Working with data frames
Objective: Student can create, inspect and modify data frames
Description:
- Creating a data frame
- Data frame operations
- Indexing data frames
- Column calculations
- Filtering and subsetting a data frame
- Conducting exploratory data analysis on a data frame
Exercises: Drills
Assets needed: Iris dataset
Time: 70 minutes
Lesson 5: Reading, writing and exploring data frames
Objective: Student can read, write and analyze tabular external fines
Description:
- Reading and writing csv and txt files
- Reading and writing Excel files
- Exploring a dataset
- Descriptive statistics
Exercises: Drills
Assets needed: Iris dataset
Time: 40 minutes
Lesson 6: Data manipulation with dplyr
Objective: Student can perform common data manipulation tasks with dplyr
Description:
- Manipulating rows
- Manipulating columns
- Summarizing data
Exercises: Drills
Assets needed: Airport flight records
Time: 50 minutes
Lesson 7: Data manipulation with dplyr, continued
Objective: Student can perform more advanced data manipulation with dplyr
Description:
- Building a data pipeline
- Joining two datasets
- Reshaping a dataset
Exercises: Drills
Assets needed: Airport flight records
Time: 50 minutes
Lesson 8: R for data visualization
Objective: Student can create graphs in R using visualization best practices
Description:
- Graphics in base R
- Visualizing a variable’s distribution
- Visualizing values across categories
- Visualizing trends over time
- Graphics in ggplot2
Exercises: Drills
Assets needed: Airport flight records
Time: 70 minutes
Lesson 9: Capstone
Objective: Student can complete end-to-end data exploration project in R
Assets needed: Baseball records
Time: 40 minutes
This download is part of my resource library. For exclusive free access, subscribe below.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.