Site icon R-bloggers

How to Filter Rows In R?

[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Filter Rows In R? appeared first on Data Science Tutorials

How to Filter Rows In R, it’s common to want to subset a data frame based on particular conditions. Fortunately, using the filter() function from the dplyr package makes this simple.

library(dplyr)

This tutorial uses the built-in dplyr dataset starwars to show numerous examples of how to utilize this function in practice.

Test for Normal Distribution in R-Quick Guide – Data Science Tutorials

look at the first six rows of the Star Wars dataset

head(starwars)
# A tibble: 6 x 13
  name  height  mass hair_color skin_color eye_color birth_year gender homeworld
1 Luke~    172    77 blond      fair       blue            19   male   Tatooine
2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>   Tatooine
3 R2-D2     96    32 <NA>       white, bl~ red             33   <NA>   Naboo   
4 Dart~    202   136 none       white      yellow          41.9 male   Tatooine
5 Leia~    150    49 brown      light      brown           19   female Alderaan
6 Owen~    178   120 brown, gr~ light      blue            52   male   Tatooine
# ... with 4 more variables: species , films , vehicles ,
#   starships

Example 1: Filter Rows Equal to Some Value

The code below explains how to find rows in the dataset when the variable ‘species’ equals Droid.

One way ANOVA Example in R-Quick Guide – Data Science Tutorials

starwars %>% filter(species == 'Droid')
# A tibble: 5 x 13
  name  height  mass hair_color skin_color eye_color birth_year gender homeworld
1 C-3PO    167    75        gold       yellow           112    Tatooine
2 R2-D2     96    32        white, bl~ red               33    Naboo   
3 R5-D4     97    32        white, red red               NA    Tatooine
4 IG-88    200   140 none       metal      red               15 none       
5 BB8       NA    NA none       none       black             NA none       

This criterion was met by 5 rows in the dataset, as indicated by #A tibble: 5 x 13.

Example 2: Using ‘And’ to Filter Rows

We may also look for rows with Droid as the species and red as the eye color.

Quantiles by Group calculation in R with examples – Data Science Tutorials

starwars %>% filter(species == 'Droid' & eye_color == 'red')
# A tibble: 3 x 13
  name  height  mass hair_color skin_color eye_color birth_year gender homeworld
1 R2-D2     96    32 <NA>       white, bl~ red               33 <NA>  Naboo   
2 R5-D4     97    32 <NA>       white, red red               NA <NA>  Tatooine
3 IG-88    200   140 none       metal      red               15 none  <NA>     

These criteria were met by three rows in the dataset.

Example 3: Using ‘Or’ to Filter Rows

We may also look for rows with Droid as the species or red as the eye color:

starwars %>% filter(species == 'Droid' | eye_color == 'red')
# A tibble: 7 x 13
  name  height  mass hair_color skin_color eye_color birth_year gender homeworld
1 C-3PO    167    75 <NA>       gold       yellow           112 <NA>   Tatooine
2 R2-D2     96    32 <NA>       white, bl~ red               33 <NA>   Naboo   
3 R5-D4     97    32 <NA>       white, red red               NA <NA>   Tatooine
4 IG-88    200   140 none       metal      red               15 none   <NA>    
5 Bossk    190   113 none       green      red               53 male   Trandosha
6 Nute~    191    90 none       mottled g~ red               NA male   Cato Nei~
7 BB8       NA    NA none       none       black             NA none   <NA>    

These criteria were met by 7 rows in the dataset, as can be seen.

Count Observations by Group in R – Data Science Tutorials

Example 4: Filter Rows with Values in a List

We can also look for rows where the eye color is part of a color palette.

starwars %>% filter(eye_color %in% c('blue', 'yellow', 'red'))
# A tibble: 35 x 13
   name  height  mass hair_color skin_color eye_color birth_year gender
 1 Luke~    172    77 blond      fair       blue            19   male 
 2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>
 3 R2-D2     96    32 <NA>       white, bl~ red             33   <NA> 
 4 Dart~    202   136 none       white      yellow          41.9 male 
 5 Owen~    178   120 brown, gr~ light      blue            52   male 
 6 Beru~    165    75 brown      light      blue            47   female
 7 R5-D4     97    32 <NA>       white, red red             NA   <NA>
 8 Anak~    188    84 blond      fair       blue            41.9 male 
 9 Wilh~    180    NA auburn, g~ fair       blue            64   male 
10 Chew~    228   112 brown      unknown    blue           200   male 

We can observe that 35 of the rows in the dataset had blue, yellow, or red eyes.

Example 5: Filter Rows Using Less Than or Greater Than

We can also use less than and greater than operations on numeric variables to filter rows.

How to perform the MANOVA test in R? – Data Science Tutorials

find rows with a height of more than 250

starwars %>% filter(height > 250)
# A tibble: 1 x 13
  name  height  mass hair_color skin_color eye_color birth_year gender homeworld
1 Yara~    264    NA none       white      yellow            NA male   Quermia 

Look for rows with a height of 200 to 230.

starwars %>% filter(height > 200 & height < 230)
# A tibble: 5 x 13
  name  height  mass hair_color skin_color eye_color birth_year gender homeworld
1 Dart~    202   136 none       white      yellow          41.9 male   Tatooine
2 Rugo~    206    NA none       green      orange          NA   male   Naboo   
3 Taun~    213    NA none       grey       black           NA   female Kamino  
4 Grie~    216   159 none       brown, wh~ green, y~       NA   male   Kalee   
5 Tion~    206    80 none       grey       black           NA   male   Utapau  

discover rows with a height that is higher than the average height.

How to Count Distinct Values in R – Data Science Tutorials

starwars %>% filter(height > mean(height, na.rm = TRUE))
   name     height  mass hair_color  skin_color eye_color birth_year sex    gender
   <chr>     <int> <dbl> <chr>       <chr>      <chr>          <dbl> <chr>  <chr>
 1 Darth V~    202   136 none        white      yellow          41.9 male   mascu~
 2 Owen La~    178   120 brown, grey light      blue            52   male   mascu~
 3 Biggs D~    183    84 black       light      brown           24   male   mascu~
 4 Obi-Wan~    182    77 auburn, wh~ fair       blue-gray       57   male   mascu~
 5 Anakin ~    188    84 blond       fair       blue            41.9 male   mascu~
 6 Wilhuff~    180    NA auburn, gr~ fair       blue            64   male   mascu~
 7 Chewbac~    228   112 brown       unknown    blue           200   male   mascu~
 8 Han Solo    180    80 brown       fair       brown           29   male   mascu~
 9 Jabba D~    175  1358 NA          green-tan~ orange         600   herma~ mascu~
10 Jek Ton~    180   110 brown       fair       blue            NA   male   mascu~

The post How to Filter Rows In R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.