Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
As data analytics consulting firm, we think we are fortunate that we keep finding problems to find. Recently my team mate found a glaring problem of not having any connector for R with Google. With the inspiration from Michael, Ajay O, it soon become a worth problem to solve.
With RGoogleAnalytics package now, we have solved the problem of data extraction into R from Google Analytics a new breed of ideas started emerging primarily around visualization. I have been playing with GGplot2 has been great package to convert data into visualization. Thanks Dr. Hadley Wickham. Once you have Following this blogpost, you are with the code there in position to have data required to get these calendar heat map done. Take up below given code and paste into R console and play around to see if you find it easy working thru. If you have trouble, feel free to reach out to us.
Here is the code for extracting the Google analytics data using R-google analytics package. Before running the following code, download RGoogleAnalytics package and install it.
#Load RGoogleAnalytics library library("RGoogleAnalytics") # Create query builder object query <- QueryBuilder() # Authorize your account and paste the accesstoken access_token <- query$authorize() # Create a new Google Analytics API object ga <- RGoogleAnalytics() ga.profiles <- ga$GetProfileData(access_token) # List the GA profiles ga.profiles # select index corresponds to your profile and set it to query string # For example if index is 7 of your GA profile then set ga.profiles$id[7] in # query$Init() method given below # Build the query string query$Init(start.date = "2010-01-01", # Set start date end.date = "2012-12-31", # Set end date dimensions = "ga:date", metrics = "ga:visits,ga:transactions", max.results = 10000, table.id = paste("ga:",ga.profiles$id[6],sep="",collapse=","), access_token=access_token) # Make a request to get the data from the API ga.data <- ga$GetReportData(query) # data will be stored in this data frame # Set date in format YYYY-MM-DD (to use into heatmap calender) ga.data$date < - as.Date(as.character(ga.data$date),format="%Y%m%d")
For this example of Calender heatmap, I am using data of an e-commerce store with having data for more than 2 years in business. I will be plotting visits as well as transactions on calendar so that I’d get perspective on how they interact viz-a-viz timeline.
Here is the code for plotting the heat map after you get data and have it store in 'data'. This frame is used to reference the source of data for the visualization below.
# Recommended R version - 2.15.1 or higher # install required library by using the command install.packages(“libraryname”) # For example install.packages(“ggplot2”) # Required library library(“quantmod”) library(“ggplot2”) library(“reshape2”) library(“plyr”) library(“scales”) # Set extracted data to this data frame data <- ga.data # Run commands listed below data$year <- as.numeric(as.POSIXlt(data$date)$year+1900) data$month <- as.numeric(as.POSIXlt(data$date)$mon+1) data$monthf <- factor(data$month,levels=as.character(1:12), labels=c("Jan","Feb","Mar","Apr","May","Jun", "Jul","Aug","Sep","Oct","Nov","Dec"), ordered=TRUE) data$weekday <- as.POSIXlt(data$date)$wday data$weekdayf <- factor(data$weekday,levels=rev(0:6), labels=rev(c("Mon","Tue","Wed","Thu","Fri","Sat","Sun")), ordered=TRUE) data$yearmonth <- as.yearmon(data$date) data$yearmonthf <- factor(data$yearmonth) data$week <- as.numeric(format(as.Date(data$date),"%W")) data <- ddply(data,.(yearmonthf),transform,monthweek=1+week-min(week)) # Plot for visits P_visits <- ggplot(data, aes(monthweek, weekdayf, fill = visits)) + geom_tile(colour = "white") + facet_grid(year~monthf) + scale_fill_gradient(high="#D61818",low="#B5E384") + labs(title = "Time-Series Calendar Heatmap") + xlab("Week of Month") + ylab("") # View plot P_visits #Plot for transactions P_transactions <- ggplot(data, aes(monthweek, weekdayf, fill = transactions)) + geom_tile(colour = "white") + facet_grid(year~monthf) + scale_fill_gradient(formatter = "comma",high="#D61818",low="#B5E384") + labs(title = "Time-Series Calendar Heatmap") + xlab("Week of Month") + ylab("") # View plot P_transactions
Once you run the code, you will be in position to get output like below:
Now that we have a calendar heat map for visits, let me pull it off for transaction. In the above code for Google Analytics data extraction you have use transaction as well as visits as metrics. Since the data is already available in the ‘data’. we are ready by changing in code of visualization to choose the heat map for transaction now.
Its quite interesting now that you can make super nice inferences like I did below:
- Tuesdays have high visits days but wed has been the day when most transactions occurs
- Visits increases towards the end of year (shopping season) and then slows down towards year start
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.