Site icon R-bloggers

Sort data frames by columns

[This article was first published on Quantargo Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

To select areas of interest in a data frame they often need to be ordered by specific columns. The dplyr arrange() function supports data frame orderings by multiple columns in ascending and descending order.

arrange(, )
arrange(, , , ...)

The arrange() function with a single column

arrange(, )
arrange(, , , ...)

The arrange() function orders the rows of a data frame. It takes a data frame or a tibble as the first parameter and the names of the columns based on which the rows should be ordered as additional parameters. Let’s assume, we want to answer the question: Which states had the highest percentage of Republican voters in the 2016 US presidential election? To answer this question, in the following example we use the pres_results_2016 data frame, containing information only for the 2016 US presidential election. We arrange() the data frame based on the rep column (Republican votes in percentage):

arrange(pres_results_2016, rep)
# A tibble: 51 x 6
   year state total_votes   dem    rep  other
  < dbl> < chr>       < dbl> < dbl>  < dbl>  < dbl>
1  2016 DC         312575 0.905 0.0407 0.0335
2  2016 HI         437664 0.610 0.294  0.0958
3  2016 VT         320467 0.557 0.298  0.0737
# … with 48 more rows

As you can see in the output, the data frame is sorted in an ascending order based on the rep column. However, we would prefer to have the results in a descending order, so that we can instantly see the state with the highest rep percentage. To sort a column in a descending order, all we need to do is apply the desc() function on the given column inside the arrange() function:

arrange(pres_results_2016, desc(rep))
# A tibble: 51 x 6
   year state total_votes   dem   rep  other
  < dbl> < chr>       < dbl> < dbl> < dbl>  < dbl>
1  2016 WV         713051 0.265 0.686 0.0489
2  2016 WY         258788 0.216 0.674 0.0830
3  2016 OK        1452992 0.289 0.653 0.0575
# … with 48 more rows

Arranging is not only possible on numeric values, but on character values as well. In that case, dplyr sorts the rows in alphabetic order. We can arrange character columns just like numeric ones:

arrange(pres_results_2016, state)
# A tibble: 51 x 6
   year state total_votes   dem   rep  other
  < dbl> < chr>       < dbl> < dbl> < dbl>  < dbl>
1  2016 AK         318608 0.366 0.513 0.0928
2  2016 AL        2123372 0.344 0.621 0.0254
3  2016 AR        1130635 0.337 0.606 0.0577
# … with 48 more rows

Exercise: Use arrange() based on a single column

The gapminder_2007 dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which country had the lowest life expectancy lifeExp in 2007! The dplyr package is already loaded.

  1. Apply the arrange() function on the gapminder_2007 tibble
  2. Order the tibble based on the lifeExp column
Start Exercise

Exercise: Use arrange() in combination with desc()

The gapminder_2007 dataset contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect which countries had the largest population in 2007! The dplyr package is already loaded.

  1. Apply the arrange() function on the gapminder_2007 tibble.
  2. Sort the tibble in a descending order based on the pop column.
Start Exercise

The arrange() function with multiple columns

We can use the arrange() function on multiple columns as well. In this case the order of the columns in the function parameters, sets a hierarchy of ordering. The function starts by ordering the rows based on the first column defined in the parameters. In case there are several rows with the same value, the function decides the order based on the second column defined in the parameters. If there are still multiple rows with the same values, the function decides based on the third column defined in the parameters (if defined) and so on.

In the following example we use the pres_results_subset data frame, containing election results only for the states: "TX"(Texas),"UT"(Utah) and "FL"(Florida). First we sort the data frame in a descending order based on the year column. Then, we add a second level, and order the data frame based on the dem column:

arrange(pres_results_subset, year, dem)
# A tibble: 33 x 6
   year state total_votes   dem   rep   other
  < dbl> < chr>       < dbl> < dbl> < dbl>   < dbl>
1  1976 UT         541218 0.336 0.624 0.0392 
2  1976 TX        4071884 0.511 0.480 0.00817
3  1976 FL        3150631 0.519 0.466 0.0143 
# … with 30 more rows

As you can see in the output, the data frame is overall ordered based on the year column. However, when the value of year is the same, the order of the rows is decided by the dem column.

Exercise: Use arrange() based on multiple columns

The gapminder_2007 tibble contains economic and demographic data about various countries for the year 2007. Arrange the tibble and inspect for each continent, which countries had the highest life expectancy in 2007! The dplyr package is already loaded.

  1. Apply the arrange() function on the gapminder_2007 tibble.
  2. Order the tibble based on the continent column!
  3. In case there are rows with the same continent, sort the tibble in a descending order based on the lifeExp column!
Start Exercise

Quiz: arrange() Function

Which of the following statements are true about the arrange() function? Start Quiz

Sort data frames by columns is an excerpt from the course Introduction to R, which is available for free at quantargo.com

VIEW FULL COURSE

To leave a comment for the author, please follow the link and comment on their blog: Quantargo Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.