Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Have you seen the Walmart growth maps video at flowingdata? And, did you wish that you could create similar animated movies using R? Well, you came to the right place. In this post, you will learn how to create point maps at the zip code level and then animate them to make a movie using your favorite libraries in R such as ggplot2 and dplyr. The bonus, of course, is to recreate the Walmart growth maps in R.
Let’s get started then.
Loading Libraries
You can see from the following commands that I owe a tremendous amount of gratitude to Hadley Wickham, for he has made many of the programming tasks easier with his R packages. ggplot2
has to be one of the best plotting packages and with dplyr
manipulating and aggregating data has become less tedious and more enjoyable. He also has created packages for string manipulation (stringr
) and date manipulation (lubridate
). We need all of them.
library(ggplot2) #for plotting, of course library(ggmap) #to get the US map library(dplyr) #for data manipulation like a ninja library(readr) #to read the data in csv library(lubridate) #to play with dates library(scales) #for number labeling library(stringr) #to play with strings library(Cairo) #for anti-aliasing images on Windows library(zipcode) #to clean up and geocode zipcodes |
Gathering, Cleaning-up, Summarizing the Data
Let’s load up the zip code data using the following command:
data(zipcode) |
Next, use the fantastic ggmap
library to get the US map. Here, I’m using the toner-lite
type of map from stamen maps, a great resource for fantastic maps.
us_map <- get_stamenmap(c(left = -125, bottom = 25.75, right = -67, top = 49), zoom = 5, maptype = "toner-lite") #you can change the map type to get something more colorful |
Get the store openings data along with the dates and addresses.
walmart_op_data <- read_csv("http://users.econ.umn.edu/~holmes/data/WalMart/store_openings.csv") |
A couple of clean up-items: convert the date column to the data format, create opening year and month columns. I also created another date column for looping through and creating maps for each month. You may ask, “Why did you not use just the opening date?” Good question. I am not using the opening date because after creating a map, I am saving the map using the same value for the filename. We are using ffmpeg
here to create movie out of images, and ffmpeg likes sequential numbering of files. For a Mac OS, the file name is irrelevant, because we can use the glob
argument of ffmpeg.
walmart_op_data <- walmart_op_data %>% mutate(OPENDATE = as.Date(OPENDATE, '%m/%d/%Y'), OpYear = year(OPENDATE), OpMonth = month(OPENDATE), OpDate = as.Date(paste(OpMonth, 1, OpYear, sep = "/"), '%m/%d/%Y'), ZIPCODE = clean.zipcodes(ZIPCODE)) |
Now, calculate monthly openings and cumulative openings by month and year.
wm_op_data_smry <- walmart_op_data %>% count(OpYear, OpMonth) %>% ungroup() %>% arrange(OpYear, OpMonth) %>% mutate(cumm_n = cumsum(n)) |
Join the data with zipcode to get the latitude and longitude for each zipcode
walmart_op_data <- left_join(walmart_op_data, select(zipcode, zip, latitude, longitude), by = c("ZIPCODE" = "zip")) |
Get all the zipcodes for each month’s Walmart opening.
wm_op_data_smry <- inner_join(wm_op_data_smry, select(walmart_op_data, ZIPCODE, latitude, longitude, OpYear, OpMonth), by = c("OpYear" = "OpYear", "OpMonth" = "OpMonth")) |
Creating maps
The fun part: to actually see the data on the US map. Since we have to create a map for each month, we are using a function here for repetition and ease of use. I’ve explained the code using R comments.
#this function has three arguments. #' df: a dataframe used for plotting #' plotdate: date used for splitting the data frame into before and after #' mapid: a number for naming the final map file my_zip_plot <- function(df, plotdate, mapid){ # create the background map. using the darken argument to make the map filled with black color. g <- ggmap(us_map, darken = c("0.8", "black")) # split the data frame for all Walmarts before a plot date i.e. a month old_df <- filter(df, OpDate < plotdate) # split the data frame for all Walmarts for the plot date i.e. during a month new_df <- filter(df, OpDate == plotdate) # plot all the Walmarts before the current opening month. Make all the older store locations as shown in circles smaller g <- g + geom_point(data = old_df, aes(x = longitude, y = latitude), size = 5, color = "dodgerblue", alpha = 0.4) #plot all the Walmarts during the current opening month. Make all the newer store locations as shown in circles bigger to get the "pop" effect g <- g + geom_point(data = new_df, aes(x = longitude, y = latitude), size = 8, color = "dodgerblue", alpha = 0.4) # remove axis marks, labels, and titles g <- g + theme(axis.ticks = element_blank(), axis.title = element_blank(), axis.text = element_blank(), plot.title = element_blank()) # place the label for year g <- g + annotate("text", x = -75, y = 34, label = "YEAR:", color = "white", size = rel(5), hjust = 0) # place the value of for year g <- g + annotate("text", x = -75, y = 33, label = unique(new_df$OpYear), color = "white", size = rel(6), face = 2, hjust = 0) # place the label for stores opened g <- g + annotate("text", x = -75, y = 32, label = "STORE COUNT:", color = "white", size = rel(5), hjust = 0) # place cumulative store openings g <- g + annotate("text", x = -75, y = 31, label = comma(unique(new_df$cumm_n)), color = "white", size = rel(6), face = 2, hjust = 0) # generate the file name for the map. Using str_pad to make the filename same length and prefixed with zeroes. # create a maps directory inside the directory of this script. filename <- paste0("maps/img_" , str_pad(mapid, 7, pad = "0"), ".png") # save the map as a png using Cairo for anti-aliasing on Windows. ggsave(filename = filename, plot = g, width = 13, height = 7, dpi = 150, type = "cairo-png") } |
A note about color: I played with a few tools such as paletton to find good contrasting colors for the individual dots. After spending crazy amounts of time just trying different combinations, I just picked one of the blue shades from R named colors.
Loop through the data frame for each month and create a map for that month. Warning: these commands will take about 5 minutes and create a lot of maps. Make sure that you created a directory named maps inside the directory of this file and that you have enough space to store all the maps.
wm_op_data_smry %>% mutate(mapid = group_indices_(wm_op_data_smry, .dots = 'OpDate')) %>% # created a group id for each group defined using OpDate i.e. a month group_by(OpDate) %>% # note when you are using user-defined functions in dplyr, you have to use do do(pl = my_zip_plot(wm_op_data_smry, unique(.$OpDate), unique(.$mapid))) ## pass the summary data frame, the date, and map number to the function |
Creating the Walmart Growth Movie
This is the fun part in which we put all the images together to make a “motion picture.” For this to work, you will need ffmpeg installed on your computer. On Windows, for ease of use, install it in the C drive.
# prepare the command for execution # the framerate argument controls how many frames per second we want to see. Increase that number for a faster transition between months. # since we used 7 digits as the fixed length of filenames, %7d pattern will match those filenames. makemovie_cmd <- paste0("C:/ffmpeg/bin/ffmpeg -framerate 5 -y -i ", paste0(getwd(), "/maps/img_%7d.png"), " -c:v libx264 -pix_fmt yuv420p ", paste0(getwd(), "/maps/"), "movie.mp4") # for mac os, you can use the glob argument and obviate the need for sequential numbering of files. #makemovie_cmd <- paste0("ffmpeg -framerate 5 -y -pattern_type glob -i '", paste0(getwd(), "/maps/"), "*.png'", " -c:v libx264 -pix_fmt yuv420p '", paste0(getwd(), "/maps/"), "movie.mp4'") system(makemovie_cmd) # the system command will execute the ffmpeg command and create the final movie. |
Play!
What do you think? I like this approach as it offers maximum flexibility of ggplot, you can adjust the sizes of new points, and you adjust the frame rate. Another advantage: you can relatively change the data source and use it for any other mapping purposes.
Complete Code
library(ggplot2) library(ggmap) library(dplyr) library(readr) library(lubridate) library(scales) library(stringr) library(Cairo) library(zipcode) data(zipcode) us_map <- get_stamenmap(c(left = -125, bottom = 25.75, right = -67, top = 49), zoom = 5, maptype = "toner-lite") walmart_op_data <- read_csv("http://users.econ.umn.edu/~holmes/data/WalMart/store_openings.csv") walmart_op_data <- walmart_op_data %>% mutate(OPENDATE = as.Date(OPENDATE, '%m/%d/%Y'), OpYear = year(OPENDATE), OpMonth = month(OPENDATE), OpDate = as.Date(paste(OpMonth, 1, OpYear, sep = "/"), '%m/%d/%Y'), ZIPCODE = clean.zipcodes(ZIPCODE)) wm_op_data_smry <- walmart_op_data %>% count(OpYear, OpMonth) %>% ungroup() %>% arrange(OpYear, OpMonth) %>% mutate(cumm_n = cumsum(n)) walmart_op_data <- left_join(walmart_op_data, select(zipcode, zip, latitude, longitude), by = c("ZIPCODE" = "zip")) wm_op_data_smry <- inner_join(wm_op_data_smry, select(walmart_op_data, ZIPCODE, latitude, longitude, OpYear, OpMonth), by = c("OpYear" = "OpYear", "OpMonth" = "OpMonth")) my_zip_plot <- function(df, plotdate, mapid){ g <- ggmap(us_map, darken = c("0.8", "black")) old_df <- filter(df, OpDate < plotdate) new_df <- filter(df, OpDate == plotdate) g <- g + geom_point(data = old_df, aes(x = longitude, y = latitude), size = 5, color = "dodgerblue", alpha = 0.4) g <- g + geom_point(data = new_df, aes(x = longitude, y = latitude), size = 8, color = "dodgerblue", alpha = 0.4) g <- g + theme(axis.ticks = element_blank(), axis.title = element_blank(), axis.text = element_blank(), plot.title = element_blank()) g <- g + annotate("text", x = -75, y = 34, label = "YEAR:", color = "white", size = rel(5), hjust = 0) g <- g + annotate("text", x = -75, y = 33, label = unique(new_df$OpYear), color = "white", size = rel(6), face = 2, hjust = 0) g <- g + annotate("text", x = -75, y = 32, label = "STORE COUNT:", color = "white", size = rel(5), hjust = 0) g <- g + annotate("text", x = -75, y = 31, label = comma(unique(new_df$cumm_n)), color = "white", size = rel(6), face = 2, hjust = 0) filename <- paste0("maps/img_" , str_pad(mapid, 7, pad = "0"), ".png") ggsave(filename = filename, plot = g, width = 13, height = 7, dpi = 150, type = "cairo-png") } wm_op_data_smry %>% mutate(mapid = group_indices_(wm_op_data_smry, .dots = 'OpDate')) %>% group_by(OpDate) %>% do(pl = my_zip_plot(wm_op_data_smry, unique(.$OpDate), unique(.$mapid))) makemovie_cmd <- paste0("C:/ffmpeg/ffmpeg/bin/ffmpeg -framerate 5 -y -i ", paste0(getwd(), "/maps/img_%7d.png"), " -c:v libx264 -pix_fmt yuv420p ", paste0(getwd(), "/maps/"), "movie.mp4") #works on mac #makemovie_cmd <- paste0("ffmpeg -framerate 5 -y -pattern_type glob -i '", paste0(getwd(), "/maps/"), "*.png'", " -c:v libx264 -pix_fmt yuv420p '", paste0(getwd(), "/maps/"), "movie.mp4'") system(makemovie_cmd) |
The post Mapping Walmart Growth Across the US using R appeared first on nandeshwar.info.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.