Select first or last rows of a data frame
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
We often do not need to look at all the contents of a data frame in the console. Instead, only parts of it are sufficient like the top or bottom retrieved through the head()
and tail()
functions.
- Select the top of a data frame
- Select the bottom of a data frame
- Specify the number of lines to select through the parameter
n
head(___, n = ___) tail(___, n = ___)
Selecting the top of a data frame
head(___, n = ___) tail(___, n = ___)
Data frames can span a large number of rows and columns. Based on the printed output in the console it can be hard to get an initial impression of the data inside the data frame. This issue is not so much of a problem for tibbles which have a nicer console output. Additionally, it can be helpful to easily retrieve the first rows in one command without any indexing or additional packages.
The TitanicSurvival
dataset contains data of 1309 passengers represented as rows. A simple print of the dataset would print all passengers, filling up the entire console. Instead, the head()
function shows only the first 10 rows of a data frame including its column names:
head(TitanicSurvival) survived sex age Allen, Miss. Elisabeth Walton yes female 29.0000 Allison, Master. Hudson Trevor yes male 0.9167 Allison, Miss. Helen Loraine no female 2.0000 Allison, Mr. Hudson Joshua Crei no male 30.0000 Allison, Mrs. Hudson J C (Bessi no female 25.0000 Anderson, Mr. Harry yes male 48.0000 passengerClass Allen, Miss. Elisabeth Walton 1st Allison, Master. Hudson Trevor 1st Allison, Miss. Helen Loraine 1st Allison, Mr. Hudson Joshua Crei 1st Allison, Mrs. Hudson J C (Bessi 1st Anderson, Mr. Harry 1st
The number of columns can be tuned using the parameter n
. To extract only the first three rows from the data set you can write:
head(TitanicSurvival, n = 3) survived sex age Allen, Miss. Elisabeth Walton yes female 29.0000 Allison, Master. Hudson Trevor yes male 0.9167 Allison, Miss. Helen Loraine no female 2.0000 passengerClass Allen, Miss. Elisabeth Walton 1st Allison, Master. Hudson Trevor 1st Allison, Miss. Helen Loraine 1st
Exercise: Select the top of a data frame
The salaries_sort
dataset contains the 2008-09 nine-month academic salary for professors from a college in the US. The dataset is sorted by salary
in ascending order.
Inspect the 10 lowest paid professors by selecting the first 10 rows using the head()
function.
Selecting the bottom of a data frame
head(___, n = ___) tail(___, n = ___)
The tail()
function can be used to select the bottom rows of a data frame. Similar to the head()
function it also accepts a parameter n
to specify the number rows to be returned.
For example, to select the last five rows from the TitanicSurvival
dataset you can write:
tail(TitanicSurvival, n = 5) survived sex age passengerClass Zabour, Miss. Hileni no female 14.5 3rd Zabour, Miss. Thamine no female NA 3rd Zakarian, Mr. Mapriededer no male 26.5 3rd Zakarian, Mr. Ortin no male 27.0 3rd Zimmerman, Mr. Leo no male 29.0 3rd
The head and tail functions can also be combined to select a fragment of the data set from the middle. To select the first five rows from the bottom 500 rows you can write:
head(tail(TitanicSurvival, n = 500), n = 5) survived sex age passengerClass Ford, Mr. Edward Watson no male 18 3rd Ford, Mr. William Neal no male 16 3rd Ford, Mrs. Edward (Margaret Ann no female 48 3rd Fox, Mr. Patrick no male NA 3rd Franklin, Mr. Charles (Charles no male NA 3rd
Exercise: Select the bottom of a data frame
The salaries_sort
dataset contains the 2008-09 nine-month academic salary for professors from a college in the US. The dataset is sorted by salary
in ascending order.
Inspect the 20 highest paid professors by selecting the last 20 rows using the tail()
function.
Exercise: Select the top from the bottom data frame
The salaries_sort
dataset contains the 2008-09 nine-month academic salary for 397 Professors from a college in the US. The dataset is sorted by the salary
in ascending order.
Inspect the 10 professors around the median salary by
- Selecting the bottom 200 professors using the
tail()
function - Selecting the top 10 professors out of the bottom 200
Select first or last rows of a data frame is an excerpt from the course Introduction to R, which is available for free at quantargo.com
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.