Site icon R-bloggers

Filtering Data in R 10 Tips -tidyverse package

[This article was first published on Methods – finnstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

filtering data in r, In this tutorial describes how to filter or extract data frame rows based on certain criteria.

In this tutorial, you will learn the filter R functions from the tidyverse package.

The main idea is to showcase different ways of filtering from the data set.

Filtering data is one of the common tasks in the data analysis process. When you want to remove or extract a part of the data use tidyverse package ’filter()’ function.

Load Library

library(tidyverse)
head(msleep)
name                       genus      vore  order        conservation sleep_total sleep_rem sleep_cycle awake  brainwt  bodywt
                                                                          
 1 Cheetah                    Acinonyx   carni Carnivora    lc                  12.1      NA        NA      11.9 NA        50    
 2 Owl monkey                 Aotus      omni  Primates     NA                  17         1.8      NA       7    0.0155    0.48 
 3 Mountain beaver            Aplodontia herbi Rodentia     nt                  14.4       2.4      NA       9.6 NA         1.35 
 4 Greater short-tailed shrew Blarina    omni  Soricomorpha lc                  14.9       2.3       0.133   9.1  0.00029   0.019
 5 Cow                        Bos        herbi Artiodactyla domesticated         4         0.7       0.667  20    0.423   600    
 6 Three-toed sloth           Bradypus   herbi Pilosa       NA                  14.4       2.2       0.767   9.6 NA         3.85 

We’ll use the R built-in msleep data set, which we will use for a different types of filtering.

Example 1

Sentiment analysis in R » Complete Tutorial »

data1<-msleep %>%
  select(name,sleep_total) %>%
  filter(sleep_total>15)

Output:-

      name                           sleep_total                                    
  1 Owl monkey                            17  
  2 Long-nosed armadillo                  17.4
  3 North American Opossum                18  
  4 Big brown bat                         19.7
  5 Thick-tailed opposum                  19.4
  6 Little brown bat                      19.9
  7 Tiger                                 15.8
  8 Giant armadillo                       18.1
  9 Arctic ground squirrel                16.6
 10 Golden-mantled ground squirrel        15.9
 11 Eastern american chipmunk             15.8
 12 Tenrec                                15.6

Example 2

Correlation Analysis Different Types of Plots in R »

data2<-msleep %>%
  select(name,sleep_total) %>%
  filter(!sleep_total>15)

Output:-

     name                       sleep_total                                
  1 Cheetah                           12.1
  2 Mountain beaver                   14.4
  3 Greater short-tailed shrew        14.9
  4 Cow                                4  
  5 Three-toed sloth                  14.4
  6 Northern fur seal                  8.7
  7 Vesper mouse                       7  
  8 Dog                               10.1
  9 Roe deer                           3  
 10 Goat                               5.3

Example 3

Power analysis in Statistics with R »

data3<-msleep %>%
  select(name,order,bodywt,sleep_total) %>%
  filter(order=="Primates", bodywt>15)

Output:-

    name       order    bodywt sleep_total                     
 1 Human      Primates   62           8  
 2 Chimpanzee Primates   52.2         9.7
 3 Baboon     Primates   25.2         9.4

Example 4

Principal component analysis (PCA) in R »

data4<-msleep %>%
select(name, order, bodywt,sleep_total) %>%
  filter(order=="Primates" | bodywt>15)

Output:-

    name              order           bodywt sleep_total                                    
  1 Cheetah           Carnivora        50           12.1
  2 Owl monkey        Primates          0.48        17  
  3 Cow               Artiodactyla    600            4  
  4 Northern fur seal Carnivora        20.5          8.7
  5 Goat              Artiodactyla     33.5          5.3
  6 Grivet            Primates          4.75        10  
  7 Asian elephant    Proboscidea    2547            3.9
  8 Horse             Perissodactyla  521            2.9
  9 Donkey            Perissodactyla  187            3.1
 10 Patas monkey      Primates         10           10.9

Example 5

data5<-msleep %>%
  select(name,sleep_total) %>%
  filter(name=="Cow" |
           name=="Dog"|
           name=="Goat")

Output:-

   name  sleep_total          
 1 Cow           4  
 2 Dog          10.1
 3 Goat          5.3

Example 6

Stock Prediction-Intraday Trading » With High Accuracy »

data6<-msleep %>%
  select(name, sleep_total) %>%
  filter(name %in% c("Cow","Dog","Goat"))

Output:-

   name  sleep_total          
 1 Cow           4  
 2 Dog          10.1
 3 Goat          5.3

Example 7

data7<-msleep %>%
  select(name, sleep_total) %>%
  filter(between(sleep_total,16,18))

Output:-

   name                   sleep_total                           
 1 Owl monkey                    17  
 2 Long-nosed armadillo          17.4
 3 North American Opossum        18  
 4 Arctic ground squirrel        16.6

Example 8

KNN Algorithm Machine Learning » Classification & Regression »

data8<-msleep %>%
  select(name, sleep_total) %>%
  filter(near(sleep_total,17, tol=0.5))

Output:-

   name                   sleep_total                           
 1 Owl monkey                    17  
 2 Long-nosed armadillo          17.4
 3 Arctic ground squirrel        16.6

Example 9

data9<-msleep %>%
  select(name, conservation,sleep_total) %>%
  filter(is.na(conservation))

Output:-

   name                        conservation sleep_total                                         
  1 "Owl monkey"                NA                  17  
  2 "Three-toed sloth"          NA                  14.4
  3 "Vesper mouse"              NA                   7  
  4 "African giant pouched rat" NA                   8.3
  5 "Western american chipmunk" NA                  14.9
  6 "Galago"                    NA                   9.8
  7 "Human"                     NA                   8  
  8 "Macaque"                   NA                  10.1
  9 "Vole "                     NA                  12.8
 10 "Little brown bat"          NA                  19.9

Example 10

data10<-msleep %>%   
select(name, conservation,sleep_total) %>%   
filter(!is.na(conservation)) 

Output:-

    name                       conservation sleep_total                                        
  1 Cheetah                    lc                  12.1
  2 Mountain beaver            nt                  14.4
  3 Greater short-tailed shrew lc                  14.9
  4 Cow                        domesticated         4  
  5 Northern fur seal          vu                   8.7
  6 Dog                        domesticated        10.1
  7 Roe deer                   lc                   3  
  8 Goat                       lc                   5.3
  9 Guinea pig                 domesticated         9.4
 10 Grivet                     lc                  10 

Market Basket Analysis in R » What Goes With What »

The post Filtering Data in R 10 Tips -tidyverse package appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Methods – finnstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.