Three Quick and Simple Data Cleaning Helper Functions (December 2013)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As I go about cleaning and merging data sets with R I often end up creating and using simple functions over and over. When this happens, I stick them in the DataCombine package. This makes it easier for me to remember how to do an operation and others can possibly benefit from simplified and (hopefully) more intuitive code.
I've talked about some of the commands in DataCombine in previous posts. In this post I'll give examples for a few more that I've added over the past couple of months. Note: these examples are based on DataCombine version 0.1.11.
Here is a brief run down of the functions covered in this post:
FindReplace
: a function to replace multiple patterns found in a character string column of a data frame.MoveFront
: moves variables to the front of a data frame. This can be useful if you have a data frame with many variables and want to move a variable or variables to the front.rmExcept
: removes all objects from a work space except those specified by the user.
FindReplace
Recently I needed to replace many patterns in a column of strings. Here is a short example. Imagine we have a data frame like this:
ABData <- data.frame(a = c("London, UK", "Oxford, UK", "Berlin, DE", "Hamburg, DE", "Oslo, NO"), b = c(8, 0.1, 3, 2, 1))
Ok, now I want to replace the UK
and DE
parts of the strings with England
and Germany
. So I create a data frame with two columns. The first records the pattern and the second records what I want to replace the pattern with:
Replaces <- data.frame(from = c("UK", "DE"), to = c("England", "Germany"))
Now I can just use FindReplace
to make the replacements all at once:
library(DataCombine) ABNewDF <- FindReplace(data = ABData, Var = "a", replaceData = Replaces, from = "from", to = "to", exact = FALSE) # Show changes ABNewDF ## a b ## 1 London, England 8.0 ## 2 Oxford, England 0.1 ## 3 Berlin, Germany 3.0 ## 4 Hamburg, Germany 2.0 ## 5 Oslo, NO 1.0
If you set exact = TRUE
then FindReplace
will only replace exact pattern matches. Also, you can set vector = TRUE
to return only a vector of the column you replaced (the Var
column), rather than the whole data frame.
MoveFront
On occasion I've wanted to move a few variables to the front of a data frame. The MoveFront
function makes this pretty simple. It only has two arguments: data
and Var
. Data is the data frame and Var
is a character vector with the columns I want to move to the front of the data frame in the order that I want them. Here is an example:
# Create dummy data A <- B <- C <- 1:50 OldOrder <- data.frame(A, B, C) names(OldOrder) ## [1] "A" "B" "C" # Move B and A to the front NewOrder2 <- MoveFront(OldOrder, c("B", "A")) names(NewOrder2) ## [1] "B" "A" "C"
rmExcept
Finally, sometimes I want to clean up my work space and only keep specific objects. I want to remove everything else. This is straightforward with rmExcept
. For example:
# Create objects A <- 1 B <- 2 C <- 3 # Remove all objects except for A rmExcept("A") ## Removed the following objects: ## ABData, ABNewDF, B, C, NewOrder2, OldOrder, Replaces # Show workspace ls() ## [1] "A"
You can set the environment you want to clean up with the envir
argument. By default is is your global environment.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.