Site icon R-bloggers

Adobe Sitecatalyst API and R: integrate reports with the SAINT classification file

[This article was first published on MilanoR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

From original post @ http://analyticsblog.mecglobal.it/en/analytics-tools/adobe-sitecatalyst-api-in-r/

Are you a heavy user of Sitecatalyst and the very famous R package RSitecatalyst to analyze your web analytics data and make insightful vizualitations?

Or wonder why the hell do you need to download lots of excel files to get a simple report from Adobe cloud?

Well today, we are going to solve a problem that everyone whos into reporting face all day. We’ll show you what to do in order to provide analysts with more time to concentrate on the data, and zero time to aggregate them.
All you need is The R programming environment settled and we are ready to go:
First thing to do is obviously loading all the libraries we need and login using our username and token which you can find here:

library(RSiteCatalyst) library(sqldf)

SCAuth("[USERNAME:COMPANY]","[TOKEN]")

What to do next is simply get our hands dirty with the API.
This is an example on how we can use it to get visits and time on site by tracking code:

elements<-GetElements("[WEBSITE_ID]")

visits_per_day_by_tracking_code<-QueueTrended("[WEBSITE_ID]","[DATE_BEGIN]","[DATE_END]","visits",elements="trackingcode"
                          ,date.granularity = "day", top="1000", start="0")

time_per_day_by_tracking_code<-QueueTrended("[WEBSITE_ID]","[DATE_BEGIN]","[DATE_END]","totaltimespent",elements="trackingcode"
                           ,date.granularity = "day", top="1000", start="0")

visits_and_time_by_tracking_code<-merge(visits_by_tracking_code,time_by_tracking_code,by=c("name","datetime"),all.x=TRUE)

The problem with this data is that is definitely too detailed for our reporting scope. What we really need to know is how Campaigns have performed rather than tracking code.

To accomplish that, we’re gonna upload our saint classification file we can download from Adobe and then we’re ready to make our analysis easier. Once you have downloaded that here is the code to put all together.

library(xlsx)

saint_data<-read.xlsx("[SAINT_CLASSIFICATION_FILE.xlsx]",1)

metrics_by_campaign_placement<-merge(visits_and_time_by_tracking_code,saint_data,by.x="name",
                      by.y="Key")

landing_pages<-QueueRanked("[WEBSITE_ID]","[DATE_BEGIN]","[DATE_END]", 
                                 c("visits","bounces"), 
                                 elements = c("entrypage","trackingcode"), top="50000", start="0" )

landing_pages_by_campaign<-merge(landing_pages,saint_data,by.x="trackingcode",
                             by.y="Key",all.x=TRUE)

As you can see the code is quite straight forward, all we need to do is basically a inner join over the key which is unique in the tracking code and in the data retrieved from the API.
The cool thing here is that we can attach each metric we think would be crucial for our analysis of understanding the advertising activities in the saint classification without the need of loading it into the platform; What we internally attach are key metrics such as advertising costs and impressions by placement, campaigns divided by KPIS such as brand awareness or business performance.
To give you an example of what you can do, here we provide you with the code to generate a chart which give you an idea of which advertising publisher is performing better in terms of visits,time on site and impressions:

Campaign_performance<-sqldf("select Campaigns,sum(visits) from metrics_by_campaign_placement group by Campaigns order by sum(visits)")

Publisher_performance<-sqldf("select Publisher,sum(visits) as visits,sum(totaltimespent) time_on_site,sum(impressions) as impressions from metrics_by_campaign_placement group by Publisher order by sum(visits)")

Placement_performance<-sqldf("select Ads,sum(visits),sum(totaltimespent),sum(impressions) from metrics_by_campaign_placement group by Ads order by sum(visits)")

landing_pages_performance<-sqldf("select Campaigns,entrypage,sum(visits),sum(bounces)/sum(visits) as bouncerate from landing_pages_by_campaign group by Campaigns order by sum(visits) desc")

To summaries with a clear chart, here’s the output you can expect with these data using the ggplot package:

The dimensions of the balls is the number of impressions, while the balls are all the publisher or social platforms you use for your advertisings campaigns.

# PLOT THE DATA USING GGPLOT2

ggplot(data=Publisher_performance, aes(x=Visits, y=Time_on_site)) +
  geom_point(aes(size=impressions,colour = placement)) +  
  scale_size_continuous(range=c(2,15)) +
  theme(legend.position = "none")+
  geom_text(aes(label=placement),hjust = 1.5 )

To leave a comment for the author, please follow the link and comment on their blog: MilanoR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.