Apply family functions – Part 1
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The apply family functions belong to the R base package, they are especially useful when combining their use with functions to manipulate pieces of matrices, arrays, lists and data frames. These functions allow you to cross data in multiple ways to avoid the use of for loops that are usually computationally heavier.
The first function we will talk about in this series is the apply ()
function, which in its simplest form of use is used to evaluate the margins (1 = rows or 2 = columns) of a matrix or an array to apply a function to them.
As a first example, we start from a matrix with three rows and three columns.
mat <- matrix(c(2, 4, 6, 7, 8, 9, 1, 12, 21), nrow = 3, ncol = 3) mat ## [,1] [,2] [,3] ## [1,] 2 7 1 ## [2,] 4 8 12 ## [3,] 6 9 21
If you wish, for example, to obtain the sum of each column, you can use the apply ()
function as follows.
apply(mat, 2, sum) ## [1] 12 24 34
We can also calculate the average of each row.
apply(mat, 1, mean) ## [1] 3.333333 8.000000 12.000000
There are also some functions already programmed in the R base
package that quickly replicate the previous results. For example, there is the colSums ()
function to calculate the amount of each column, and rowMeans ()
to obtain the arithmetic mean of each row.
colSums(mat) ## [1] 12 24 34 rowMeans(mat) ## [1] 3.333333 8.000000 12.000000
The two cases shown above exemplify a basic use of the apply ()
function, however, this function is much more powerful and is capable of working in a multidimensional way. Consider, for example, an object in two dimensions (rows and columns) similar to the one created previously, that is, an array.
mat2 <- matrix(1:9, nrow = 3, ncol = 3) mat2 ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
The mat2
object represents a particular case of an array, which can be created using thearray ()
function.
array(data=1:9, dim = c(3,3)) ## [,1] [,2] [,3] ## [1,] 1 4 7 ## [2,] 2 5 8 ## [3,] 3 6 9
The array ()
function also allows you to add tags to the rows and columns using its dimnames
argument.
nombres.columnas <- c("COL1","COL2","COL3") nombres.filas <- c("FILA1","FILA2","FILA3") arreglo <- array(data=1:9, dim = c(3,3), dimnames = list(nombres.columnas, nombres.filas)) arreglo ## FILA1 FILA2 FILA3 ## COL1 1 4 7 ## COL2 2 5 8 ## COL3 3 6 9
We can add multiple dimensions to an array. For this, suppose you want to have the following sizes:
- DIM1: Numbers from 1 to 9.
- DIM2: Numbers from 1 to 9 multiplied by 10.
- DIM3: Numbers from 1 to 9 multiplied by 100.
- DIM4: Numbers from 1 to 9 increased by 1000.
One way to generate the previous arrangement is by using the following code:
nombres.dimensiones <- c("DIM1","DIM2","DIM3","DIM4") arreglo <- array(data = c(seq(from=1, to=9, by=1), #1 al 9 seq(from=10, to=90, by=10), #10 al 90 seq(from=100, to=900, by=100), #100 al 900 seq(from=1000, to=9000, by=1000)), #1000 al 9000 dim = c(3, 3, 4), #3 filas, 3 columnas y 4 dimensiones dimnames = list(nombres.filas, nombres.columnas, nombres.dimensiones)) arreglo ## , , DIM1 ## ## COL1 COL2 COL3 ## FILA1 1 4 7 ## FILA2 2 5 8 ## FILA3 3 6 9 ## ## , , DIM2 ## ## COL1 COL2 COL3 ## FILA1 10 40 70 ## FILA2 20 50 80 ## FILA3 30 60 90 ## ## , , DIM3 ## ## COL1 COL2 COL3 ## FILA1 100 400 700 ## FILA2 200 500 800 ## FILA3 300 600 900 ## ## , , DIM4 ## ## COL1 COL2 COL3 ## FILA1 1000 4000 7000 ## FILA2 2000 5000 8000 ## FILA3 3000 6000 9000
Starting from the previous array, suppose that you want to obtain the maximum value per row from each dimension.
apply(arreglo, c(3,1), max) ## FILA1 FILA2 FILA3 ## DIM1 7 8 9 ## DIM2 70 80 90 ## DIM3 700 800 900 ## DIM4 7000 8000 9000
Or, you may want to obtain the maximum value of each column from each dimension.
apply(arreglo, c(3,2), max) ## COL1 COL2 COL3 ## DIM1 3 6 9 ## DIM2 30 60 90 ## DIM3 300 600 900 ## DIM4 3000 6000 9000
The following result shows the minimum of each column in each dimension.
apply(arreglo, c(2,3), min) ## DIM1 DIM2 DIM3 DIM4 ## COL1 1 10 100 1000 ## COL2 4 40 400 4000 ## COL3 7 70 700 7000
The previous examples can be applied to arrays with a higher dimension; for this, it is enough to have an adequate arrangement and operate on the corresponding margins with the apply ()
function.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.