Introduction to R for Data Science :: Session 3
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Welcome to Introduction to R for Data Science Session 3! The course is co-organized by Data Science Serbia and Startit. You will find all course material (R scripts, data sets, SlideShare presentations, readings) on these pages.
Welcome to the third session of Introduction to R for Data Science! Check out the Course Overview to acess the learning material presented thus far.
Data Science Serbia Course Pages [in Serbian]
Startit Course Pages [in Serbian]
Lecturers
- dipl. ing Branko Kovač, Data Analyst at CUBE, Data Science Mentor at Springboard, Data Science Serbia
- Goran S. Milovanović, Phd, DataScientist@DiploFoundation, Data Science Serbia
Summary of Session 3, 12. may 2016 :: Introduction to R: Lists and Functions in R
Introduction to lists and functions in R. R is a higher programming language where one uses the list data type a lot. We will introduce this dynamic data type during this session. R is also a functional programming language: everything that happens in R is a call and an execution of some function. Even “operators” in R – as simple as “+” or “-” – are functions. In this session we learn how to write R functions. We then combine functions and lists to learn about a rather handy lapply() function. Then we proceed to demonstrate the usage of her cousin apply() – applied to matrices across their dimensions.
Intro to R for Data Science SlideShare :: Session 3
R script :: Session 3
######################################################## # Introduction to R for Data Science # SESSION 3 :: 12 May, 2016 # Data Science Community Serbia + Startit # :: Goran S. Milovanović and Branko Kovač :: ######################################################## # clear all rm(list=ls()); # It's time to speak about lists num_vct <- c(2:5) # just another num vector chr_vct <- c("data", "science") # char vector data_frame <- data.frame(x = c("a", "b", "c", "d"), y = c(1:4)) # simple df lista <- list(data_frame, num_vct, chr_vct) # and this is a list lista # this is our list str(lista) # about a list length(lista) as.list(chr_vct) # another way to create a list # Lists manipulation names(lista) <- c("data", "numbers", "words") lista[3] # 3rd element? lista[[3]] # 3rd element? is.list(lista[3]) # is this a list? is.list(lista[[3]]) # and this? class(lista[[3]]) # also a list? not be so sure! lista$words # we can also extract an element this way lista[["words"]] # or even like this length(lista$words) # 2 as expected lista[["words"]][1] # digging even deeper lista$new_elem <- c(TRUE, FALSE, FALSE, TRUE) # add new element length(lista) # now list has 4 elements lista$new_elem <- NULL # but we can remove it easily new_vect <- unlist(lista) # creating a vector from list # Introduction to Functions in R # (w. less formalism but tips & tricks added) # elementary: a defition of a function in R fun <- function(x) x+10; fun(5) # taking two arguments fun2 <- function(x,y) x+y; fun2(3,4) # using "{" and "}" to enclose multiple R expresions in the function body fun <- function(x,y) { a <- sum(x); b <- sum(y); a-b } r <- c(5,4,3); q <- c(1,1,1); fun(r,q) fun(c(5,4,3),c(1,1,1)) # NOTE: "{" and "}" are generally used in R to mark the beginning and the end of block # a function is a function: is.function(fun); is.function(log); # log is built-in # printing function to acess their source code; fun log # try: is.primitive(log); this one is written in C, belongs to the base package - it's "under the hood" # Built in functions + functional programming ("Everything is a function...") "^"(2,2) "^"(2,3) # magic! - how do you do that? 2^2 2^3 # the difference between "operators" and "functions" in R: none. Everything is a function: "+"(2,2) # Four? 2+2 # yeah, right # Oh but I love this "-"("+"(3,5),2); "&"(">"(2,2),T); "&"(">"(3,2),T); # punishment: write all your lab code for this week in this fashion... # built in functions: x <- 16; sqrt(x); x <- c(1,2,3,4,5,6,7,8,9); mean(x); # whatch for NAs in statistics (!) x <- c(1,2,3,4,5,6,7,8,NA); mean(x); mean(x, na.rm = T); # right! median(x); sd(x); sum(x); sum(x, na.rm = T); # a-ha! # Lexical scoping in R + nested functions # example taken from: http://adv-r.had.co.nz/Functions.html # "Advanced R" by Hadley Wickham # ";"s added by GSM x <- 1; h <- function() { y <- 2; i <- function() { z <- 3 c(x, y, z) } i(); } h(); # Messing up argument names (never do this in nested functions unless you have to) rm(x, h); x <- 1; h <- function(x) { y <- x+1 i <- function(x) { z <- x+2; z } z <- i(x); c(x,y,z) } h(x) # Two things that come handy: lapply and apply # Step 1: here's a list: aList <- list(c(1,2,3), c(4,5,6), c(7,8,9), c(10,11,12)); # Step 2: I want to apply the following function: myFun <- function(x) { x[1]+x[2]-x[3] } # to all elements of the aList list, and get the result as a list again. Here it is: res <- lapply(aList, function(x) { x[1]+x[2]-x[3] }); unlist(res) # to get a vector rm(myFun); # Now say I've got a matrix myMat <- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, ncol=3); # btw is.function(matrix); # reminder class(myMat); typeof(myMat); # now, I want the sums of all rows: rsMyMat <- apply(myMat, 1, function(x) { sum(x) }); rsMyMat; is.list(rsMyMat) # just beatiful # for columns: csMyMat <- apply(myMat, 2, function(x) { sum(x) }); # with existings functions such as sum(), this will do: rsMyMat1 <- apply(myMat, 1, sum); rsMyMat1 csMyMat1 <- apply(myMat, 2, sum); csMyMat1
Readings :: Session 4 [19. May, 2016, @Startit.rs, 19h CET]
Chapters 1 - 10, The Art of R Programming, Norman Matloff
Session 3 Photos
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.