Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
In data analysis with R, subsetting data frames based on multiple conditions is a common task. It allows us to extract specific subsets of data that meet certain criteria. In this blog post, we will explore how to subset a data frame using three different methods: base R’s subset()
function, dplyr’s filter()
function, and the data.table package.
Examples
< section id="using-base-rs-subset-function" class="level2">Using Base R’s subset() Function
Base R provides a handy function called subset()
that allows us to subset data frames based on one or more conditions.
# Load the mtcars dataset data(mtcars) # Subset data frame using subset() function subset_mtcars <- subset(mtcars, mpg > 20 & cyl == 4) # View the resulting subset print(subset_mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
In the above code, we first load the mtcars
dataset. Then, we use the subset()
function to create a subset of the data frame where the miles per gallon (mpg
) is greater than 20 and the number of cylinders (cyl
) is equal to 4. Finally, we print the resulting subset.
Using dplyr’s filter() Function
dplyr is a powerful package for data manipulation, and it provides the filter()
function for subsetting data frames based on conditions.
# Load the dplyr package library(dplyr) # Subset data frame using filter() function filter_mtcars <- mtcars %>% filter(mpg > 20, cyl == 4) # View the resulting subset print(filter_mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb Datsun 710 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 Merc 240D 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 Merc 230 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 Fiat 128 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 Toyota Corolla 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 Toyota Corona 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 Fiat X1-9 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 Volvo 142E 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
In this code snippet, we load the dplyr package and use the %>%
operator, also known as the pipe operator, to pipe the mtcars
dataset into the filter()
function. We specify the conditions within the filter()
function to create the subset, and then print the resulting subset.
Using data.table Package
The data.table package is known for its speed and efficiency in handling large datasets. We can use data.table’s syntax to subset data frames as well.
# Load the data.table package library(data.table) # Convert mtcars to data.table dt_mtcars <- as.data.table(mtcars) # Subset data frame using data.table syntax dt_subset_mtcars <- dt_mtcars[mpg > 20 & cyl == 4] # Convert back to data frame (optional) subset_mtcars_dt <- as.data.frame(dt_subset_mtcars) # View the resulting subset print(subset_mtcars_dt)
mpg cyl disp hp drat wt qsec vs am gear carb 1 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1 2 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2 3 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2 4 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1 5 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2 6 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1 7 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1 8 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1 9 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2 10 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2 11 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2
In this code block, we first load the data.table package and convert the mtcars
data frame into a data.table using the as.data.table()
function. Then, we subset the data using data.table’s syntax, specifying the conditions within square brackets. Optionally, we can convert the resulting subset back to a data frame using as.data.frame()
function before printing it.
Conclusion
In this blog post, we learned three different methods for subsetting data frames in R by multiple conditions. Whether you prefer base R’s subset()
function, dplyr’s filter()
function, or data.table’s syntax, there are multiple ways to achieve the same result. I encourage you to try out these methods on your own datasets and explore the flexibility and efficiency they offer in data manipulation tasks. Happy coding!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.