Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently, I was working on a large assignment involving complex data manipulation inside Power BI when a co-worker asked me a question I’m no stranger to: “Why are you doing that in M? Why not use R?”
I stammered through a few reasons until they felt comfortable, but ultimately left with the question still in my mind. I wanted a more crisp and clear answer. This post should hopefully clear up some understanding or misconceptions on how each works within Power BI, and some strengths or weaknesses. Keep in mind, this never has to be either-or! M can easily come without R, but R will almost never come in Power BI without M.
A Couple General Rules
Use what you’re comfortable with
These aren’t one-size-fits-all answers. Your situation will vary, and it’s important to take your context into account when making a decision. Learning a whole new language is rarely the answer to a single problem you’ve encountered.
The cost of re-platforming
As a rebuttal to the item immediately above, when you’re handed a chunk of custom M or custom R, it’s natural to want to switch to what you’re comfortable with, but you should consider what this means. Introducing language complexities to an enterprise strapped for skilled resources can make a problem exponentially worse in the long-run. Know what your company uses, what your co-workers have, and what makes the most sense in your ecosystem. Convincing a bureaucratic organization to install R on a fleet of computers can be an overwhelming challenge, for example.
When to Use M
You’re distributing the PBIX file
I’m not here to admonish you for your blasé attitude toward version control! Maybe your organization doesn’t have Power BI Premium, but relies on Power BI for its reporting. Maybe you distribute files that rely on certain custom parameters so each person can see their own data. There are plenty of reasons why organizations choose to share PBIX files. When doing this, custom M code is generally preferable, because it’s all built into Power BI. A file using R requires each individual have R installed, all the same packages you’ve used, and preferably the same versions, to avoid package conflicts.
The dataset is larger than your computer’s memory
Remember that R stores its data in memory (unless you’re using RevoScaleR, but we can talk about that later!). Power BI Premium as of this article supports up to a 10GB dataset. If your enterprise uses 8GB or 4GB computers, you’re best sticking to M. Another tip for this situation is to create a VM in Azure with higher specs, dedicated to your Power BI development.
You want to protect the code, or consider it proprietary
Writing your own custom M connectors is a beautiful thing. This is the subject of an upcoming article more in-depth, but suffice it to say, if you’re using standard connections such as an Odata Service, XML, Web, or any others, and you consider the way you’re connecting to retrieve and manipulate the data to be proprietary to your company or you want to protect it from tinkering by users, you can package up all your M as a connector and use that alone. Note that this is a double edged sword – you need to distribute your connector to any users of the PBIX file, and keep it up to date!
Auditing code step-by-step will be necessary
Since Power BI writes its own M code as you clean and prep the dataset, this is a no-brainer. If you need to see what happened in each step along the way, it’s best to keep it all in M. You can absolutely do this using R, but each step will have to be its own separate code chunk, and you likely lose the data type of each column every time you run a new R script.
When to Use R
You need to unlock the power of custom visualizations
To be clear, this wouldn’t take the place of M! If you’re ever extremely picky about a tertiary axis, a hexbin map, a network diagram, or just need total control over what to display (hey, I get it, some clients have a vision and won’t budge), than writing it up in R is the best place to go. At this point, you’ve already introduced R to the solution, so feel more at ease introducing it in the data manipulation steps as well.
There are complex data structures to handle
Sometimes, you encounter a JSON or XML structure that really makes you scratch your head. “Who wrote this spec? What were they thinking? Were they thinking at all?” In this situation, I’ve found myself transposing tables, converting records to tables, taking each field to a new record, running 8 nested List.Transform’s, and… you get the picture. R, with the power of lapply and more, can often get the job done simpler and easier.
Multiple distinct data sources need to be ingested in one query
This is almost a corollary to the above point. If you’re connecting to multiple non-standard places, chances are there’s already an R package for it, and you’re best off pulling them all in, joining the data (shout-out to dplyr), and then creating the output set.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.