[This article was first published on Data * Science + R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
The following blog post is based on a classical reporting task most of the people working in BI got frequently: Besides all the nice dashboards you create in Tableau, from time to time people will approach you with a request for a “data extract” – reports that typically looking like cross tables. Ideally, the extract comes as an Excel file with each crosstab in a separate sheet ready to be analyzed with the favorite tool of the requester. How will you do that in a programmatic way with Tableau? Extracting data out of the Tableau “ecosystem” to process them in another tool is clearly not part of the philosophy of the Tableau guys. They would argue that you should give those people access to Tableau so that they can do their analysis within their tools. Although this is a good point they miss some important things – first Tableau is expensive compared to solutions that come for free or solutions that are already part of the company’s software portfolio (for example Excel with free add-ons like Power Query, Power View & Power Pivot). Given that often there is a no extra or only a small fixed budget, not everybody will get a full copy of Tableau. Second, changing the standard tool for data analysis and visualization (in a broad sense) within an organization needs a lot of budget, time and power. Procedures and templates needs to be changed, people need to be convinced, trained and accompanied. Even with an impressive and sometimes magical tool like Tableau it is not for granted that this project will be a success down to the smallest corner of your company. And that’s why sometimes we need to use Tableau for purposes that are not in its focus – and extracting data out of a workbook into an excel file seems to be such a thing.Problem statement
Imagine you have a workbook containing of different sheets all in the form of cross tables. You can see a simple example in the screenshot below (and find a link for downloading the workbook at the end of the posting):Solution Overview
I got the basic idea for how to extract crosstabs from a forum posting given by Christopher Cannata “Hello everyone, I was tasked with a project to allow these crosstab CSV files to be generated automatically each morning for our company. Unfortunately tabcmd was not capable of allowing us to automate the process but I was able to develop a test suite in Selenium IDE call it from a batch script on a daily basis.”. Using a software testing framework for web applications like Selenium to automate the csv extraction process is a hell of a good idea for the first part of our question. And it turned out that PowerShell is a good approach for the second part. PowerShell is simple and yet powerful and avoids a lot of problems with encoding, formats and data types. I ran into these problems when I tried reading the raw csv and exporting it as excel using several custom libraries for Python. But as I use Python for most tasks regarding scripting and automation around Tableau, I took it again in this example to orchestrate the whole workflow. Before we go deeper into the different steps, the picture below shows the whole story:Extracting Crosstabs from Tableau Server using Selenium
Selenium is a software testing framework for web applications. As such, it can record user actions and afterwards “replay” those activities again and again on new versions of the app, testing if those action still lead to the expected result. We just need the capabilities to record and automatically run those macros (called ‘test cases’ – with a set of test cases called ‘test suite’). Here, a test case mimics user actions to download a single csv file as crosstab with all the necessary steps like – login into Tableau Server, select and view the right workbook and downloading the csv file. The installation is easy and requires two tools – the Selenium IDE for recording test cases (installed as a plugin for Firefox) and the Selenium server for running test suites. The later one is just a jar file, which is started from command line. Now start your Firefox browser and you should see a new icon for starting the developer GUI.- It helps, if you change the configurations of your Firefox profile or use dedicated one for this task – see here for a list of things that need to be adjusted.
- Set a dedicated download folder in your Firefox profile – I found no way to tell Selenium how to go through the download manager dialog.
- Selenium will not recognize that you sometimes wait for Tableau to load a workbook. You have to insert separate “pause” commands later within your test case to give Tableau time for processing before Selenium will take the next step. The length of the pause needs to be set by you depending on how long Tableau needs for loading plus some buffer.
- Create a separate test case for every view in Tableau that should be downloaded – that means your last step within a test case should be to sign out from Tableau Server.
- Test cases and test suites are stored as plain html files and can therefore easily edited using a text editor.
- -firefoxProfileTemplate <PATH-to-you-prepared-Firefox-profile>
- -htmlSuite “*firefox”
- <Tableau-Server-URL>
- <Path-to-test-suite>/<test-suite.html>
- <Path-to-test-suite-result-file-that-Selenium-will-create>/<test-suite-result-file.html>
- -t trustAllSSLCertificates (to avoid certificate checking errors)
Using PowerShell to combine several csv files into one Excel file
Before explaining the PowerShell script, I need to say that my first approach was to use one of the existing Python libraries (for example openpyxl, xlrd and XlsxWrite – just to name a few). But with none of them I got a satisfying result within a reasonable time. Biggest problems were the csv file encoding and that the downloaded csv files contain formatted numbers which seem to be easy for Excel, but difficult to read in correctly with Python.Workflow control with Python
Finally I wrote a small python script to have a central place for setting parameter values and starting the Selenium server as well as the PowerShell script. It also cleans up the download directory by dropping all existing files (therefore make sure that there is no other content within the download directory of the Firefox profile used for extracting data). The latter is important because otherwise PowerShell will use your old files instead of the new ones. The script looks as follows: One can easily add additional features to the python script like renaming old files, using dynamically created file names, sending the final Excel as email and so on. That’s all – the python script can be started with a simple double click. After waiting for it to finish, the excel extract should be ready for further use.Conclusion
Whether Tableau likes it or not – extracting crosstab data out of the Tableau ecosystem is something that may be necessary if you run Tableau inside a BI infrastructure. If this only needs to be done once, you can use the Tableau Server front-end. But you will need a more efficient way if it becomes one of your regular tasks. This post presented a method that automates the whole process and creates well formatted Excel files out of multiple Tableau workbook crosstabs. If you setup the workflow once, it is easy to adapt it for extracting data from another workbook using the old scripts as templates. To give you an example, I uploaded a zipped folder here. It contains a Tableau workbook with three crosstabs as views that can be used as example report, a test suite for this Tableau workbook, and the PowerShell script as well as the Python script. Please make sure to set your Tableau Server URL, username and password and all other parameters correctly. Pre-test the Selenium Script within the Selenium IDE Firefox plugin. If it doesn’t work check that Selenium is doing the right steps in the correct order and change the test cases accordingly. As always any feedback on this content is welcome.To leave a comment for the author, please follow the link and comment on their blog: Data * Science + R.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.