Site icon R-bloggers

Select first or last rows of a data frame

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We often do not need to look at all the contents of a data frame in the console. Instead, only parts of it are sufficient like the top or bottom retrieved through the head() and tail() functions.

head(___, n = ___)
tail(___, n = ___)

Selecting the top of a data frame

head(___, n = ___)
tail(___, n = ___)

Data frames can span a large number of rows and columns. Based on the printed output in the console it can be hard to get an initial impression of the data inside the data frame. This issue is not so much of a problem for tibbles which have a nicer console output. Additionally, it can be helpful to easily retrieve the first rows in one command without any indexing or additional packages.

The TitanicSurvival dataset contains data of 1309 passengers represented as rows. A simple print of the dataset would print all passengers, filling up the entire console. Instead, the head() function shows only the first 10 rows of a data frame including its column names:

head(TitanicSurvival)
                                survived    sex     age
Allen, Miss. Elisabeth Walton        yes female 29.0000
Allison, Master. Hudson Trevor       yes   male  0.9167
Allison, Miss. Helen Loraine          no female  2.0000
Allison, Mr. Hudson Joshua Crei       no   male 30.0000
Allison, Mrs. Hudson J C (Bessi       no female 25.0000
Anderson, Mr. Harry                  yes   male 48.0000
                                passengerClass
Allen, Miss. Elisabeth Walton              1st
Allison, Master. Hudson Trevor             1st
Allison, Miss. Helen Loraine               1st
Allison, Mr. Hudson Joshua Crei            1st
Allison, Mrs. Hudson J C (Bessi            1st
Anderson, Mr. Harry                        1st

The number of columns can be tuned using the parameter n. To extract only the first three rows from the data set you can write:

head(TitanicSurvival, n = 3)
                               survived    sex     age
Allen, Miss. Elisabeth Walton       yes female 29.0000
Allison, Master. Hudson Trevor      yes   male  0.9167
Allison, Miss. Helen Loraine         no female  2.0000
                               passengerClass
Allen, Miss. Elisabeth Walton             1st
Allison, Master. Hudson Trevor            1st
Allison, Miss. Helen Loraine              1st

Exercise: Select the top of a data frame

The salaries_sort dataset contains the 2008-09 nine-month academic salary for professors from a college in the US. The dataset is sorted by salary in ascending order.

Inspect the 10 lowest paid professors by selecting the first 10 rows using the head() function.

Start Exercise

Selecting the bottom of a data frame

head(___, n = ___)
tail(___, n = ___)

The tail() function can be used to select the bottom rows of a data frame. Similar to the head() function it also accepts a parameter n to specify the number rows to be returned.

For example, to select the last five rows from the TitanicSurvival dataset you can write:

tail(TitanicSurvival, n = 5)
                          survived    sex  age passengerClass
Zabour, Miss. Hileni            no female 14.5            3rd
Zabour, Miss. Thamine           no female   NA            3rd
Zakarian, Mr. Mapriededer       no   male 26.5            3rd
Zakarian, Mr. Ortin             no   male 27.0            3rd
Zimmerman, Mr. Leo              no   male 29.0            3rd

The head and tail functions can also be combined to select a fragment of the data set from the middle. To select the first five rows from the bottom 500 rows you can write:

head(tail(TitanicSurvival, n = 500), n = 5)
                                survived    sex age passengerClass
Ford, Mr. Edward Watson               no   male  18            3rd
Ford, Mr. William Neal                no   male  16            3rd
Ford, Mrs. Edward (Margaret Ann       no female  48            3rd
Fox, Mr. Patrick                      no   male  NA            3rd
Franklin, Mr. Charles (Charles        no   male  NA            3rd

Exercise: Select the bottom of a data frame

The salaries_sort dataset contains the 2008-09 nine-month academic salary for professors from a college in the US. The dataset is sorted by salary in ascending order.

Inspect the 20 highest paid professors by selecting the last 20 rows using the tail() function.

Start Exercise

Exercise: Select the top from the bottom data frame

The salaries_sort dataset contains the 2008-09 nine-month academic salary for 397 Professors from a college in the US. The dataset is sorted by the salary in ascending order.

Inspect the 10 professors around the median salary by

  1. Selecting the bottom 200 professors using the tail() function
  2. Selecting the top 10 professors out of the bottom 200
Start Exercise

Select first or last rows of a data frame is an excerpt from the course Introduction to R, which is available for free at quantargo.com

VIEW FULL COURSE

To leave a comment for the author, please follow the link and comment on their blog: Quantargo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.