Underrated Gems in R: Must-Know Functions You’re Probably Missing Out On
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R is packed with powerhouse tools—think dplyr for data wrangling, ggplot2 for stunning visuals, or tidyr for tidying up messes. But beyond the headliners, there’s a lineup of lesser-known functions that deserve a spot in your toolkit. These hidden gems can streamline your code, solve tricky problems, and even make you wonder how you managed without them. In this post, we’ll uncover four underrated R functions: Reduce, vapply, do.call
and janitor::clean_names
. With practical examples ranging from beginner-friendly to advanced, plus outputs to show you what’s possible, this guide will have you itching to try them out in your next project. Let’s dive in and see what these under-the-radar stars can do!
1. Reduce: Collapse with Control
What It Does and Its Arguments
Reduce is a base R function that iteratively applies a two-argument function to a list or vector, shrinking it down to a single result. It’s like a secret weapon for avoiding loops while keeping things elegant.
Key Arguments:
f:
The function to apply (e.g., +, *, or a custom one).x:
The list or vector to reduce.init
(optional): A starting value (defaults to the first element of x if omitted).accumulate
(optional): If TRUE, returns all intermediate results (defaults to FALSE).
Use Cases
Summing or multiplying without explicit iteration.
Combining data structures step-by-step.
Simplifying recursive tasks.
Examples
Simple: Quick Sum
numbers <- 1:5 total <- Reduce(`+`, numbers) print(total)
[1] 15
Explanation: Reduce adds 1 + 2 = 3, then 3 + 3 = 6, 6 + 4 = 10, and 10 + 5 = 15. It’s a sleek alternative to sum().
Intermediate: String Building
words <- c("R", "is", "awesome") sentence <- Reduce(paste, words, init = "") print(sentence)
[1] " R is awesome"
Explanation: Starting with an empty string (init = ““), Reduce glues the words together with spaces. Skip init, and it starts with”R”, which might not be what you want.
Advanced: Merging Data Frames
df1 <- data.frame(a = 1:2, b = c("x", "y")) df2 <- data.frame(a = 3:4, b = c("z", "w")) df3 <- data.frame(a = 5:6, b = c("p", "q")) combined <- Reduce(rbind, list(df1, df2, df3)) print(combined)
a b 1 1 x 2 2 y 3 3 z 4 4 w 5 5 p 6 6 q
Explanation: Reduce stacks three data frames row-wise, pairing them up one by one. It’s a loop-free way to handle multiple merges.
If you’re a fan of the tidyverse, check out purrr::reduce(). It’s a modern take on base R’s Reduce, offering a consistent syntax with other purrr functions (like .x and .y for arguments) and handy shortcuts like ~ .x + .y for inline functions. It also defaults to left-to-right reduction but can go right-to-left with reduce_right(). Worth a look if you want a more polished, tidyverse-friendly alternative!
Here’s an intermediate-level example of using the reduce()
function from the purrr
package for joining multiple dataframes:
library(purrr) library(dplyr) # Create three sample dataframes representing different aspects of customer data customers <- data.frame( customer_id = 1:5, name = c("Alice", "Bob", "Charlie", "Diana", "Edward"), age = c(32, 45, 28, 36, 52) ) orders <- data.frame( order_id = 101:108, customer_id = c(1, 2, 2, 3, 3, 3, 4, 5), order_date = as.Date(c("2023-01-15", "2023-01-20", "2023-02-10", "2023-01-05", "2023-02-15", "2023-03-20", "2023-02-25", "2023-03-10")), amount = c(120.50, 85.75, 200.00, 45.99, 75.25, 150.00, 95.50, 210.25) ) feedback <- data.frame( feedback_id = 201:206, customer_id = c(1, 2, 3, 3, 4, 5), rating = c(4, 5, 3, 4, 5, 4), feedback_date = as.Date(c("2023-01-20", "2023-01-25", "2023-01-10", "2023-02-20", "2023-03-01", "2023-03-15")) ) # List of dataframes to join with the joining column dataframes_to_join <- list( list(df = customers, by = "customer_id"), list(df = orders, by = "customer_id"), list(df = feedback, by = "customer_id") ) # Using reduce to join all dataframes # Start with customers dataframe and progressively join the others joined_data <- reduce( dataframes_to_join[-1], # Exclude first dataframe as it's our starting point function(acc, x) { left_join(acc, x$df, by = x$by) }, .init = dataframes_to_join[[1]]$df # Start with customers dataframe ) # View the result print(joined_data)
customer_id name age order_id order_date amount feedback_id rating 1 1 Alice 32 101 2023-01-15 120.50 201 4 2 2 Bob 45 102 2023-01-20 85.75 202 5 3 2 Bob 45 103 2023-02-10 200.00 202 5 4 3 Charlie 28 104 2023-01-05 45.99 203 3 5 3 Charlie 28 104 2023-01-05 45.99 204 4 6 3 Charlie 28 105 2023-02-15 75.25 203 3 7 3 Charlie 28 105 2023-02-15 75.25 204 4 8 3 Charlie 28 106 2023-03-20 150.00 203 3 9 3 Charlie 28 106 2023-03-20 150.00 204 4 10 4 Diana 36 107 2023-02-25 95.50 205 5 11 5 Edward 52 108 2023-03-10 210.25 206 4 feedback_date 1 2023-01-20 2 2023-01-25 3 2023-01-25 4 2023-01-10 5 2023-02-20 6 2023-01-10 7 2023-02-20 8 2023-01-10 9 2023-02-20 10 2023-03-01 11 2023-03-15
This example demonstrates how to use reduce()
to join multiple dataframes in a sequential, elegant way. This pattern is particularly useful when dealing with complex data integration tasks where you need to combine multiple data sources with a common identifier.
2. vapply: Iteration with Assurance
What It Does and Its Arguments
vapply is another base R gem, similar to lapply but with a twist: it forces you to specify the output type and length upfront. This makes it safer and more predictable, especially for critical tasks.
Key Arguments:
X
: The list or vector to process.FUN
: The function to apply to each element.FUN.VALUE
: A template for the output (e.g., numeric(1) for a single number).
Use Cases
Guaranteeing consistent output types.
Extracting specific stats from lists.
Writing reliable code for packages or production.
Examples
Simple: Doubling Up
values <- 1:3 doubled <- vapply(values, function(x) x * 2, numeric(1)) print(doubled)
[1] 2 4 6
Explanation: Each value doubles, and numeric(1) ensures a numeric vector—simple and rock-solid.
Intermediate: Word Lengths
terms <- c("data", "science", "R") lengths <- vapply(terms, nchar, numeric(1)) print(lengths)
data science R 4 7 1
Explanation: vapply counts characters per word, delivering a numeric vector every time—no surprises like sapply might throw.
Advanced: Stats Snapshot
samples <- list(c(1, 2, 3), c(4, 5), c(6, 7, 8)) stats <- vapply(samples, function(x) c(mean = mean(x), sd = sd(x)), numeric(2)) print(stats)
[,1] [,2] [,3] mean 2 4.5000000 7 sd 1 0.7071068 1
Explanation: For each sample, vapply computes mean and standard deviation, returning a matrix (2 rows, 3 columns). It’s a tidy, type-safe summary.
3. do.call: Dynamic Function Magic
What It Does and Its Arguments
do.call in base R lets you call a function with a list of arguments, making it a go-to for flexible, on-the-fly operations. It’s like having a universal remote for your functions.
Key Arguments:
what
: The function to call (e.g., rbind, paste).args
: A list of arguments to pass.quote
(optional): Rarely used, defaults to FALSE.
Use Cases
Combining variable inputs.
Running functions dynamically.
Simplifying calls with list-based data.
Examples
Simple: Vector Mashup
chunks <- list(1:3, 4:6) all <- do.call(c, chunks) print(all)
[1] 1 2 3 4 5 6
Explanation: do.call feeds the list to c(), stitching the vectors together effortlessly.
Intermediate: Custom Join
bits <- list("Code", "Runs", "Fast") joined <- do.call(paste, c(bits, list(sep = "|"))) print(joined)
[1] "Code|Runs|Fast"
Explanation: do.call combines the list with a sep argument, creating a piped string in one smooth move.
Advanced: Flexible Binding
df_list <- list(data.frame(x = 1:2), data.frame(x = 3:4)) direction <- "vertical" bound <- do.call(if (direction == "vertical") rbind else cbind, df_list) print(bound)
x 1 1 2 2 3 3 4 4
Explanation: With direction = “vertical”, do.call uses rbind to stack rows. Change it to “horizontal”, and cbind takes over—dynamic and smart.
4. janitor::clean_names: Tame Your Column Chaos
What It Does and Its Arguments
From the janitor package, clean_names() transforms messy column names into consistent, code-friendly formats (e.g., lowercase with underscores). It’s a time-saver you’ll wish you’d known sooner.
Key Arguments:
dat
: The data frame to clean.case
: The style for names (e.g., “snake”, “small_camel”, defaults to “snake”).replace
: A named vector for custom replacements (optional).
Use Cases
Standardizing imported data with ugly headers.
Prepping data frames for analysis or plotting.
Avoiding frustration with inconsistent naming.
Examples
Simple: Basic Cleanup
library(janitor) # Create a dataframe with messy column names df <- data.frame( `First Name` = c("John", "Mary", "David"), `Last.Name` = c("Smith", "Johnson", "Williams"), `Email-Address` = c("[email protected]", "[email protected]", "[email protected]"), `Annual Income ($)` = c(65000, 78000, 52000), check.names = FALSE ) # View original column names names(df)
[1] "First Name" "Last.Name" "Email-Address" [4] "Annual Income ($)"
# Clean the names clean_df <- clean_names(df) # View cleaned column names names(clean_df)
[1] "first_name" "last_name" "email_address" "annual_income"
What clean_names()
specifically does:
Converts all names to lowercase
Replaces spaces with underscores
Removes special characters like periods and hyphens
Creates names that are valid R variable names and follow standard naming conventions
This standardization makes your data more consistent, easier to work with, and helps prevent errors when manipulating or joining datasets.
Intermediate: Custom Style
library(dplyr) library(purrr) # Create multiple dataframes with inconsistent naming df1 <- data.frame( `Customer ID` = 1:3, `First Name` = c("John", "Mary", "David"), `LAST NAME` = c("Smith", "Johnson", "Williams"), check.names = FALSE ) df2 <- data.frame( `customer.id` = 4:6, `firstName` = c("Michael", "Linda", "James"), `lastName` = c("Brown", "Davis", "Miller"), check.names = FALSE ) df3 <- data.frame( `cust_id` = 7:9, `first-name` = c("Robert", "Jennifer", "Thomas"), `last-name` = c("Wilson", "Martinez", "Anderson"), check.names = FALSE ) # List of dataframes dfs <- list(df1, df2, df3) # Clean names of all dataframes clean_dfs <- map(dfs, clean_names) # Print column names for each cleaned dataframe map(clean_dfs, names)
[[1]] [1] "customer_id" "first_name" "last_name" [[2]] [1] "customer_id" "first_name" "last_name" [[3]] [1] "cust_id" "first_name" "last_name"
# Bind the dataframes (now possible because of standardized column names) combined_df <- bind_rows(clean_dfs) print(combined_df)
customer_id first_name last_name cust_id 1 1 John Smith NA 2 2 Mary Johnson NA 3 3 David Williams NA 4 4 Michael Brown NA 5 5 Linda Davis NA 6 6 James Miller NA 7 NA Robert Wilson 7 8 NA Jennifer Martinez 8 9 NA Thomas Anderson 9
This code demonstrates a more advanced use case of the clean_names()
function when working with multiple data frames that have inconsistent naming conventions. Note that because of the different column names for customer ID, we have missing values in the combined dataframe. This example demonstrates why standardized naming is important.
Advanced: Targeted Fixes
df <- data.frame("ID#" = 1:2, "Sales_%" = c(10, 20), "Q1 Revenue" = c(100, 200)) cleaned <- clean_names(df, replace = c("#" = "_num", "%" = "_pct")) print(names(cleaned))
[1] "id" "sales" "q1_revenue"
Explanation: Custom replace swaps # for _num and % for _pct, while clean_names handles the rest—precision meets polish.
library(readxl) # Create a temporary Excel file with problematic column names temp_file <- tempfile(fileext = ".xlsx") df <- data.frame( `ID#` = 1:5, `%_Completed` = c(85, 92, 78, 100, 65), `Result (Pass/Fail)` = c("Pass", "Pass", "Fail", "Pass", "Fail"), `μg/mL` = c(0.5, 0.8, 0.3, 1.2, 0.4), `p-value` = c(0.03, 0.01, 0.08, 0.002, 0.06), check.names = FALSE ) # Save as Excel (simulating real-world data source) if (require(writexl)) { write_xlsx(df, temp_file) } else { # Fall back to CSV if writexl not available write.csv(df, sub("\\.xlsx$", ".csv", temp_file), row.names = FALSE) temp_file <- sub("\\.xlsx$", ".csv", temp_file) } # Read the file back if (temp_file == sub("\\.xlsx$", ".csv", temp_file)) { imported_df <- read.csv(temp_file, check.names = FALSE) } else { imported_df <- read_excel(temp_file) } # View original column names print(names(imported_df))
[1] "ID#" "%_Completed" "Result (Pass/Fail)" [4] "μg/mL" "p-value"
# Create custom replacements custom_replacements <- c( "μg" = "ug", # Replace Greek letter "%" = "percent", # Replace percent symbol "#" = "num" # Replace hash ) # Clean with custom replacements clean_df <- imported_df %>% clean_names() %>% rename_with(~ stringr::str_replace_all(., "p_value", "probability")) # View cleaned column names print(names(clean_df))
[1] "id_number" "percent_completed" "result_pass_fail" [4] "mg_m_l" "probability"
# Print the cleaned dataframe print(clean_df)
# A tibble: 5 × 5 id_number percent_completed result_pass_fail mg_m_l probability <dbl> <dbl> <chr> <dbl> <dbl> 1 1 85 Pass 0.5 0.03 2 2 92 Pass 0.8 0.01 3 3 78 Fail 0.3 0.08 4 4 100 Pass 1.2 0.002 5 5 65 Fail 0.4 0.06
The final output shows the transformation from problematic column names to standardized ones:
From:
ID#
%_Completed
Result (Pass/Fail)
μg/mL
p-value
To:
id_num
percent_completed
result_pass_fail
ug_m_l
probability
This example demonstrates how clean_names()
can be part of a more sophisticated data preparation workflow, especially when working with real-world data sources that contain problematic characters and naming conventions.
Conclusion: Why These Functions Deserve Your Attention
R’s ecosystem is vast, but it’s easy to stick to the familiar and miss out on tools like Reduce, vapply, do.call and clean_names. These functions might not top the popularity charts, yet they pack a punch—whether it’s collapsing data without loops, ensuring type safety, adapting on the fly, fixing messy names, or mining text for gold. The examples here show just a taste of what they can do, from quick fixes to complex tasks. Curious to see how they fit into your workflow? Fire up R, play with them, and discover how these underdogs can become your new go-tos. What other hidden R treasures have you found? Drop them in the comments—I’d love to hear!
References
R Core Team (2025). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at: https://www.R-project.org/
Firke, Sam (2023). janitor: Simple Tools for Examining and Cleaning Dirty Data. CRAN. Available at: https://CRAN.R-project.org/package=janitor
R Documentation for Reduce, vapply, do.call, clean_names.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.