Site icon R-bloggers

Automatically Exporting Multiple Cross Tables from Tableau Server into Excel

[This article was first published on Data * Science + R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

The following blog post is based on a classical reporting task most of the people working in BI got frequently: Besides all the nice dashboards you create in Tableau, from time to time people will approach you with a request for a “data extract” – reports that typically looking like cross tables. Ideally, the extract comes as an Excel file with each crosstab in a separate sheet ready to be analyzed with the favorite tool of the requester. How will you do that in a programmatic way with Tableau? Extracting data out of the Tableau “ecosystem” to process them in another tool is clearly not part of the philosophy of the Tableau guys. They would argue that you should give those people access to Tableau so that they can do their analysis within their tools. Although this is a good point they miss some important things – first Tableau is expensive compared to solutions that come for free or solutions that are already part of the company’s software portfolio (for example Excel with free add-ons like Power Query, Power View & Power Pivot). Given that often there is a no extra or only a small fixed budget, not everybody will get a full copy of Tableau. Second, changing the standard tool for data analysis and visualization (in a broad sense) within an organization needs a lot of budget, time and power. Procedures and templates needs to be changed, people need to be convinced, trained and accompanied. Even with an impressive and sometimes magical tool like Tableau it is not for granted that this project will be a success down to the smallest corner of your company. And that’s why sometimes we need to use Tableau for purposes that are not in its focus – and extracting data out of a workbook into an excel file seems to be such a thing.

Problem statement

Imagine you have a workbook containing of different sheets all in the form of cross tables. You can see a simple example in the screenshot below (and find a link for downloading the workbook at the end of the posting): If you want to extract all those crosstabs into one Excel file, you have to download them manually (as “Crosstab”) and the move them into one Excel file one-by-one. This is time-consuming & error-prone (e.g. typing mistakes when naming sheets, order of sheets in excel, default size of cells may result in unreadable content) especially if you have to do it multiple times and/or for several reports. Sadly ‘tabcmd’ is not a help here as it exports csv in key-value format instead of maintaining the crosstab layout (see this tableau forum post for what is going on – and this one for voting up an idea convincing Tableau to integrate the crosstab export into tabcmd). That said, I will show a way using free third-party tools to export all three views into on Excel (plus: basic formatting of values is preserved).

Solution Overview

I got the basic idea for how to extract crosstabs from a forum posting given by Christopher Cannata “Hello everyone, I was tasked with a project to allow these crosstab CSV files to be generated automatically each morning for our company. Unfortunately tabcmd was not capable of allowing us to automate the process but I was able to develop a test suite in Selenium IDE call it from a batch script on a daily basis.”. Using a software testing framework for web applications like Selenium to automate the csv extraction process is a hell of a good idea for the first part of our question. And it turned out that PowerShell is a good approach for the second part. PowerShell is simple and yet powerful and avoids a lot of problems with encoding, formats and data types. I ran into these problems when I tried reading the raw csv and exporting it as excel using several custom libraries for Python. But as I use Python for most tasks regarding scripting and automation around Tableau, I took it again in this example to orchestrate the whole workflow. Before we go deeper into the different steps, the picture below shows the whole story:
Note: I recommend using Firefox for all of the following activities with Selenium. But before you start create an extra profile (read here how to create new profiles that works with Selenium) and save the profile data together with all other scripts in one directory.

Extracting Crosstabs from Tableau Server using Selenium

Selenium is a software testing framework for web applications. As such, it can record user actions and afterwards “replay” those activities again and again on new versions of the app, testing if those action still lead to the expected result. We just need the capabilities to record and automatically run those macros (called ‘test cases’ – with a set of test cases called ‘test suite’). Here, a test case mimics user actions to download a single csv file as crosstab with all the necessary steps like – login into Tableau Server, select and view the right workbook and downloading the csv file. The installation is easy and requires two tools – the Selenium IDE for recording test cases (installed as a plugin for Firefox) and the Selenium server for running test suites. The later one is just a jar file, which is started from command line. Now start your Firefox browser and you should see a new icon for starting the developer GUI. The small red icon on the right side shows that Selenium is now recording every step you are doing in Firefox. Type in your Tableau Server URL, login into Tableau Server, open your report and download the crosstab as csv. As you interact with Tableau Server you will see that every step show up in the central window of your Selenium IDE. After you finished downloading the first csv, log out and stop the recoding process. Then save you test case. You can now test if everything works fine by starting the automatic “replay”. If not, change commands or add new ones. It is difficult to give a general template about how to download a crosstab from Tableau Server because the whole workflow and order of commands depends heavily on your profile details like start page, page layout, language and so on. Nevertheless I attached an example test case at the end of the post that should give some guidance on how to fine-tune a test case and provide some general tips: After you have confirmed that everything works as expected, save all test cases as a test suite. Because we don’t want to start the test suite manually each time we need to download data from Tableau, the next step is to automate the whole process. For this we can use the Selenium server that comes as a standalone jar file. The server is started from command line calling ‘java –jar <PATH-to-Selenium-Server>/<Selenium-Server.jar> ’. To be more precise, the command should have the following parameters set: In my case the command looks as follows: A final test by running the command should then confirm that there are nothing but a couple of csv files in the download directory.

Using PowerShell to combine several csv files into one Excel file

Before explaining the PowerShell script, I need to say that my first approach was to use one of the existing Python libraries (for example openpyxl, xlrd and XlsxWrite – just to name a few). But with none of them I got a satisfying result within a reasonable time. Biggest problems were the csv file encoding and that the downloaded csv files contain formatted numbers which seem to be easy for Excel, but difficult to read in correctly with Python. So after spending about an hour with each one of the libraries, I ended up with searching for an alternative approach using windows build-in capabilities. If just a double click on one of those csv files opens up Excel and gives me a nicely formatted sheet, there should be a way to automate this – and there is: use MS PowerShell. The script is straightforward – create a “hidden” workbook and add several sheets. Then go to the first sheet, open the first csv within a temporary Excel file, select all within the temporary file and copy-paste it into the sheet. Then go to the next sheet and repeat the process with the second csv. Finally save it as xlsx file (the new Excel workbook file format). Here is the code that will do that:

Workflow control with Python

Finally I wrote a small python script to have a central place for setting parameter values and starting the Selenium server as well as the PowerShell script. It also cleans up the download directory by dropping all existing files (therefore make sure that there is no other content within the download directory of the Firefox profile used for extracting data). The latter is important because otherwise PowerShell will use your old files instead of the new ones. The script looks as follows: One can easily add additional features to the python script like renaming old files, using dynamically created file names, sending the final Excel as email and so on. That’s all – the python script can be started with a simple double click. After waiting for it to finish, the excel extract should be ready for further use.

Conclusion

Whether Tableau likes it or not – extracting crosstab data out of the Tableau ecosystem is something that may be necessary if you run Tableau inside a BI infrastructure. If this only needs to be done once, you can use the Tableau Server front-end. But you will need a more efficient way if it becomes one of your regular tasks. This post presented a method that automates the whole process and creates well formatted Excel files out of multiple Tableau workbook crosstabs. If you setup the workflow once, it is easy to adapt it for extracting data from another workbook using the old scripts as templates. To give you an example, I uploaded a zipped folder here. It contains a Tableau workbook with three crosstabs as views that can be used as example report, a test suite for this Tableau workbook, and the PowerShell script as well as the Python script. Please make sure to set your Tableau Server URL, username and password and all other parameters correctly. Pre-test the Selenium Script within the Selenium IDE Firefox plugin. If it doesn’t work check that Selenium is doing the right steps in the correct order and change the test cases accordingly. As always any feedback on this content is welcome.

To leave a comment for the author, please follow the link and comment on their blog: Data * Science + R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.