Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This is a continuation of the R workshop I’m teaching at the Baruch MFE program. This section discusses the programming model of R in a slightly biased way. The full contents are below.
Contents
PART I: PRELIMINARIES
PART II: STATISTICS
- A. Distributions, Sampling, and Regression
- B. Optimization and Linear Programming
PART III: STRUCTURING CODE
- A. Dispatching Systems
- B. Real World Development
Function Definition and Evaluation
Defining a function is done via assignment, similar to any other variable.
f <- function(x) 3 * x + 2
This function can be executed as f(5)
. Since R is vector-based, a vector can be passed into the function as is giving results for each element in the vector.
> f(-5:5) [1] -13 -10 -7 -4 -1 2 5 8 11 14 17
R is a dynamically typed language so any compatible argument will be evaluated. We’ll see in the third part how this is combined with the dispatching systems to implement polymorphism (as well as class systems).
Named Arguments
In the example above, the argument was passed into the function as a positional argument. Named arguments are also supported, which allows you to specify arguments in any order you choose, assuming that the names are valid.
f <- function(x,y) (x-5)^2 + (y+2)^2 f(y=3,x=4)
Optional Arguments
Not all arguments need to be specified in a function call. When defining a function, any argument can provide a default value. When calling a function, any argument not explicitly passed into the function will then be populated by the default value.
f <- function(x,y=3) (x-5)^2 + (y+2)^2 f(4)
The Ellipsis Argument
Sometimes a collection of arguments need to be passed onto another function. This happens frequently with functions that call plot functions though it’s useful in numerous situations. Essentially any unmatched arguments will populate the ellipsis arguments, which can be passed along to another function as is.
f <- function(x, ...) plot(x, ...) f(rnorm(100), main="100 Random Values")
The ellipsis argument can be manipulated in other ways, but that is out of scope for this discussion.
First Class Functions
All functions are first class in R, which means you can pass them around like any other variable. This property is used throughout R, where numerous higher order functions are used to operate on data.
Like many languages, a single statement does not require braces, although multi-line definitions must be formally blocked. Unlike many C variants, R does not require an explicit return statement at the end of the function definition. Whatever is the result of the last statement is returned by the function.
Higher Order Functions
When working with data structures it is useful to perform actions against each column (or row) of data. In other languages this would be accomplished using a loop while in R a higher order function is employed. The most basic of these is apply
. This function is similar to the common higher order function map
but operates on an array or matrix.
> h <- getPortfolioReturns(c('AAPL','XOM','KO','F','GS'), 100) > apply(h, 2, sd) AAPL XOM KO F GS 0.01719355 0.01528400 0.01050431 0.02480413 0.03114184
When working with lists, lapply
is typically the variant to use. Other variants include sapply
(simplify result), mapply
(multivariate sapply
), and tapply
(table data). In certain cases, do.call
can be used to execute a function passing arguments to the function as a list.
Common higher order functions like fold or reduce are not built-in, although there are packages that provide these functions.
Exercise: Use a higher order function to separate up days from down days for each asset in h above.
Lambda Expressions
In the example above, sd
was used as a function reference. If the default behavior of sd
is not desired, we can construct an anonymous function that defines custom behavior.
> apply(h, 2, function(x) sd(x, na.rm=TRUE))
Anonymous functions like these are used throughout R. Note that lambda expressions are best when they are short and concise. Using this approach with longer definitions can make code hard to read.
Closures
A closure is a function with an associated environment. They can be constructed easily in R. An important consideration is that all referenced variables in the closure are by default read-only. To change their value a special assignment operator must be used, which will search through parent environments until a matching variable is found.
counter <- function(start=0) { x <- start function() { x <<- x + 1; x } }
The above function can be evaluated as,
> f <- counter(5) > f() [1] 6 > f() [1] 7
Under what situations are closures useful? Any time a function reference is passed as an argument to a function, a function signature is implicitly defined. If your function does not have the same signature, it is necessary to wrap your function in another function that matches said signature. While a direct call can be made, sometimes it’s better to delay evaluation, in which case return a function reference is the best choice.
Exercise: Use a closure to generate a parameterless function that caps returns to some threshold x.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.