Site icon R-bloggers

How to apply a transformation to multiple columns in R?

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to apply a transformation to multiple columns in R? appeared first on Data Science Tutorials

How to apply a transformation to multiple columns in R?, To apply a transformation to many columns, use R’s across() function from the dplyr package.

How to apply a transformation to multiple columns in R?

There are innumerable applications for this function, however, the following examples highlight some typical ones:

First Approach: Apply Function to Several Columns

Multiply values in col1 and col2 by 2

df %>%  mutate(across(c(col1, col2), function(x) x*2))

Second Approach: One Summary Statistic for Multiple Columns can be Calculated

calculate the mean of col1 and col2

df %>%  summarise(across(c(col1, col2), mean, na.rm=TRUE))

Third Approach: Multiple Summary Statistics to be Calculated for Multiple Columns

Calculate the mean and standard deviation for col1 and col2

df %>%  summarise(across(c(col1, col2), list(mean=mean, sd=sd), na.rm=TRUE))

The examples below demonstrate each technique using the given data frame.

Subset rows based on their integer locations

Let’s create a data frame

df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2'),
points=c(26, 22, 28, 15, 32, 28),
rebounds=c(16, 15, 16, 12, 13, 10))

Now we can view the data frame

df
   team points rebounds
1   P1     26       16
2   P1     22       15
3   P1     28       16
4   P2     15       12
5   P2     32       13
6   P2     28       10

Example 1: Apply Function to Multiple Columns

The values in the columns for points and rebounds can be multiplied by 2 using the across() function by using the following code.

library(dplyr)

Multiply by two to the values in the columns for points and rebounds.

df %>%  mutate(across(c(points, rebounds), function(x) x*2))
  team points rebounds
1   P1     52       32
2   P1     44       30
3   P1     56       32
4   P2     30       24
5   P2     64       26
6   P2     56       20

Example 2: One Summary Statistic for Multiple Columns can be Calculated

The across() function can be used to determine the mean value for both the points and rebound columns using the following sample code.

How to do Conditional Mutate in R? – Data Science Tutorials

the average value of the columns for points and rebounds.

df %>%  summarise(across(c(points, rebounds), mean, na.rm=TRUE))
    points rebounds
1 25.16667 13.66667

Be aware that we can also use the is.numeric function to have the data frame’s numeric columns generate a summary statistic automatically.

Calculate the mean value for each column of numbers in the data frame.

df %>%  summarise(across(where(is.numeric), mean, na.rm=TRUE))
  points rebounds
1 25.16667 13.66667

Example 3: Multiple Summary Statistics to be Calculated for Multiple Columns

The across() function may be used to determine the mean and standard deviation of the points and rebounds columns using the following code.

Compute the mean and standard deviation for the columns of points and rebounds.

df %>%  summarise(across(c(points, rebounds), list(mean=mean, sd=sd), na.rm=TRUE))
    points_mean points_sd rebounds_mean rebounds_sd 
1    25.16667  5.946988      13.66667     2.42212 

Now we are almost complete with dplyr package techniques. We will discuss transmute() function in an upcoming post.

How to change the column positions in R? – Data Science Tutorials

Check your inbox or spam folder to confirm your subscription.

The post How to apply a transformation to multiple columns in R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.