Changing individual column names
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Problem
How do I change the name of just one column in a data frame?
Context
This is a simple one that keeps coming up. Sometimes, whoever put together my data decided to capitalize the first letter of some column names and not others. Sometimes I’ve merged several data frames together and I need to distinguish the columns from each other.
Say my data frame is p8_0
and I’d like to change the column Area
to area
.
In the past, I’ve done this in one of two ways. Either I change all of the column names at once (if all of them need to be changed), or I use numerical column indexing. The latter makes a lot more sense if I have a lot of columns to deal with, but it means I have to know the number of the column whose name I have to change.
To find this out, I first have to look at all of the column names. Okay, no problem.
# See column names and numerical indices names(p8_0) [1] "FID" "Join_Count" "TARGET_FID" [4] "Field1" "barcode" "stratum" [7] "lcode" "sdate" "utm_e" [10] "utm_n" "snag" "OBJECTID" [13] "uniq_id" "aa_num" "AQUA_CODE" [16] "AQUA_DESC" "pool" "Area" [19] "Perimeter" "bath_pct" "max_depth" [22] "avg_depth" "sd_depth" "tot_vol" [25] "area_gt50" "area_gt100" "area_gt200" [28] "area_gt300" "avg_fetch" "shoreline_density_index" [31] "econ" "sill" "min_rm" [34] "max_rm" "len_met" "len_prm_lotic" [37] "pct_prm_lotic" "num_lotic_outl" "len_prm_lentic" [40] "pct_prm_lentic" "num_lentic_outl" "pct_aqveg" [43] "pct_opwat" "len_terr" "pct_terr" [46] "pct_aq" "len_wetf" "pct_prm_wetf" [49] "pct_terr_shore_wetf" "len_wd" "wdl_p_m2" [52] "num_wd" "scour_wd" "psco_wd" [55] "len_revln" "rev_p_m2" "num_rev" [58] "pct_terr_shore_rev" "pct_prm_rev" "area_tpi1" [61] "pct_tpi1" "area_tpi2" "pct_tpi2" [64] "area_tpi3" "pct_tpi3" "area_tpi4" [67] "pct_tpi4" "sinuosity" "year_phot" [70] "NEAR_TERR_FID" "NEAR_TERR_DIST" "NEAR_TERR_CLASS_31" [73] "NEAR_TERR_CLASS_15" "NEAR_TERR_CLASS_7" "NEAR_TERR_CLASS_31_N" [76] "NEAR_TERR_CLASS_15_N" "NEAR_TERR_CLASS_7_N" "NEAR_TERR_HEIGHT_N" [79] "NEAR_FOREST_FID" "NEAR_FOREST_DIST" "NEAR_FOREST_CLASS_31" [82] "NEAR_FOREST_CLASS_15" "NEAR_FOREST_CLASS_7" "NEAR_FOREST_CLASS_31_N" [85] "NEAR_FOREST_CLASS_15_N" "NEAR_FOREST_CLASS_7_N" "NEAR_FOREST_HEIGHT_N" [88] "year.p" "depth.p" "current.p" [91] "gear.p" "stageht.p" "substrt.p" [94] "wingdike.p" "riprap.p" "trib.p" [97] "snagyn" "area_le50" "area_le100" [100] "area_le200" "area_le300" "pct_area_le100" [103] "pct_area_le50" "pct_area_le200" "pct_area_le300" [106] "stratum_name"
Okay, yes problem.
It’s not that hard to see that Area is the 18th column. But there are a bunch of columns that start with NEAR_TERR_
and NEAR_FOREST_
that would be easy to confuse. And what if I later modify my data cleaning script, insert new columns, and mess up the numerical indexing?
Solution
The first solution I came up with is simple but pretty clunky. At least it solves the problem of numerical indices getting misaligned. And if you mistype the column name or try to change the name of a column that doesn’t exist, it doesn’t throw an error.
# Change "Area" column name to "area" names(p8_0)[names(p8_0) == "Area"] <- "area"
This works well, but it gets annoying if you have more than one column name to change. Every column requires typing names(p8_0)
twice, and that adds up to a lot of lines of code.
To no one’s surprise, dplyr
has a more elegant solution, using the rename
function.
# Load dplyr library(dplyr) # Rename variable (new name first) p8_0 %>% rename(area = Area)
A quick note on rename
: somewhat counterintuitively, the new name comes before the old name. General example:
# General syntax for rename df %>% rename(newname = oldname)
rename
saves a whole bunch of keystrokes and also scales very well to multiple columns.
Let’s say I wanted to change Area
and Perimeter
to area
and perimeter
, respectively, and I also wanted to change the rather clunky shoreline_density_index
to sdi
. And while we’re at it, snagyn
, a factor variable that indicates whether a large piece of wood was present at the site (“yes” or “no”), might be clearer as snag_yn
and sinuosity
could be shortened to sinu
Without dplyr
:
# Change each column name individually names(p8_0)[names(p8_0) == "Area"] <- "area" names(p8_0)[names(p8_0) == "Perimeter"] <- "perimeter" names(p8_0)[names(p8_0) == "shoreline_density_index"] <- "sdi" names(p8_0)[names(p8_0) == "snagyn"] <- "snag_yn" names(p8_0)[names(p8_0) == "sinuosity"] <- "sinu"
With dplyr
:
# Change any column names you want to, all at once p8_0 %>% rename(area = Area, perimeter = Perimeter, sdi = shoreline_density_index, snag_yn = snagyn, sinu = sinuosity)
So pretty. As an added bonus, you’re saved from both quotation marks and the dreaded double equals sign (!!!).
In case anyone was counting, that’s 102 characters vs. 238 (spaces not included). 116 if you include loading dplyr
, but you already had it loaded because you’re using it throughout your code, of course.
Outcome
Now I can rename only the columns I want, by name instead of numerical index, without fear of having to change everything if I insert or delete some columns later on.
I’m sure I’ll get used to it eventually, but putting the new column name before the old one in the rename
function really throws me. I’d prefer a “from –> to” syntax. If you’re like me and you’re willing to make your code slightly longer, there’s also dplyr::recode
# Change column names with recode library(dplyr) names(p8_0) % recode(Area = "area", Perimeter = "perimeter", shoreline_density_index = "sdi", snagyn = "snag_yn", sinuosity = "sinu")
132 characters is still a heck of a lot better than 238, and it might be worth it for more intuitive and memorable syntax.
Resources
More thoughts on changing individual variable names, including a couple other packages if you feel like trying them:
https://stackoverflow.com/questions/7531868/how-to-rename-a-single-column-in-a-data-frame
Documentation for the recode
function: https://dplyr.tidyverse.org/reference/recode.html
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.