Accessing PIWIK PRO from R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The main tool for tracking the action on a website is Google Analytics. But more and more websites switch to other tools such as Matomo or PIWIK PRO due to GDPR.
My employer decided to switch to PIWIK PRO, too. So I was looking for a way to access the data PIWIK PRO was collecting to process it with R. When we used Google Analytics as web analytics tool I used RGoogleAnalytics. I added some enhancements such as caching the data and splitting the requests into daily chunks to handle sampling issues with Google Analytcs.
Unfortunately I haven’t found any R package providing access to PIWIK PRO data. So I wrote my own: piwikproR
Here I want to show you how to use it.
Installation
Currently piwikproR
isn’t yet available at CRAN. But using devtools
the
installation from github is as simple as out of CRAN:
1 |
devtools::install_github("dfv-ms/piwikproR") |
Using piwikproR
Credentials
Before we can use the API of PIWIK PRO we have to generate API credentials.
Doing so we get two strings: CLIENT_ID
and CLIENT_SECRET
.
With these two strings we can generate a token for the actual access. So let’s put the credentials into a list:
1 2 3 4 5 6 7 8 9 10 |
library(piwikproR) piwik_pro_credentials <- list( client_id = "CLIENT_ID", client_secret = "CLIENT_SECRET", url = "https://my_site.piwik.pro" ) # Fetch token token <- get_login_token(piwik_pro_credentials) |
Columns
Now let’s define which columns we want to fetch. Therefor we build a tibble containing the column-name and an optional transformation:
1 2 3 4 5 6 |
columns <- tibble::tribble( ~column, ~transformation, "timestamp", "", "event_url", "to_path", "page_views", "", ) |
In the example above we will get the date as the first column, the path-part of
each url (instead of https://www.my-domain.com/some/path/to/the/site.html
we
will get only /some/path/to/the/site.html
) and the last column contains the
number of page_views.
For further details take a look at the documentation at PIWIK PRO.
Filters
As an optional part we can pass a filter to the API-call so the server will do the filtering.
Let’s say we’re only interested in page_views generated by Desktop-devices. So we build the following filter-object:
1 2 3 4 5 |
filters <- tibble::tribble( ~column, ~operator, ~value, "device_type", "eq", 0 ) filters <- build_filter(filters, "and") |
Adding more lines to filters would add more criteria.
Fetching the data
Now it’s time to fetch the data. We have to choose the date range and the actual website we’re fetching the data for:
1 2 3 4 5 6 7 8 9 |
website_id <- 'my_website_id' start.date <- "2021-04-01" end.date <- "2021-04-30" query <- build_query(lubridate::ymd(start.date), lubridate::ymd(end.date), website_id, filters = filters, columns, max_lines = 0 ) data <- send_query(query, token, caching = TRUE, fetch_by_day = FALSE) |
The result data
is a tibble containing the specified columns.
Documentation
PIWIK PRO provides a detailed documentation for their API at https://developers.piwik.pro/en/latest/custom_reports/index.html.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.