Site icon R-bloggers

Code-First Data Science for the Enterprise

[This article was first published on RStudio | Open source & professional software for data science teams on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As a data scientist, or as a leader of a data science team, you know the power and flexibility that open source data science delivers. However, if your team works within a typical enterprise, you compete for budget and executive mindshare with a wide variety of other analytic tools, including self-service BI and point-and-click data science tools. Navigating this landscape, and convincing others in your organization of the value of open source data science, can be difficult. In this blog post, we draw on our recent webinar on this topic to give you some talking points to use with your colleagues when tackling this challenge.

However, it is important to keep in mind that “code-first” does not mean “code only.” While code is often the right choice, most organizations need multiple tools, to ensure you have the right tool for the task at hand.

The Pitfalls of BI Tools and Codeless Data Science

There are multiple ways to approach any given analytic problem. At their core, various data science and BI tools share many aspects. They all provide a way of drawing on data from multiple data sources, and to explore, visualize and understand that data in open-ended ways. Many tools support some way of creating applications and dashboards that can be shared with others to improve their decision-making.

Since these very different approaches can end up delivering applications and dashboards that may (at first glance) appear very similar, the strengths and nuances of the different approaches can be obscured to decision makers, especially to executive budget holders—which leads to the potential competition between the groups.

However, when taking a codeless approach, it can be difficult to achieve some critical analytic best practices, and to answer some very common and important questions:

At best, wrestling with questions like these will distract an analytics team, burning precious time that could be spent on new, valuable analyses. At worst, stakeholders end up with inconsistent or even incorrect answers because the analysis is wrong, not the correct version, or not reproducible. This can fundamentally undermine the credibility of the analytics team. Either way, the potential impact of the team for supporting decision makers is greatly reduced.

The benefits of code-first data science

RStudio’s mission is to create free and open-source software for data science, because we fundamentally believe that this enhances the production and consumption of knowledge, and facilitates collaboration and reproducible research.

At the core of this mission is a focus on a code-first approach. Data scientists grapple every day with novel, complex, often vaguely-defined problems with potential value to their organization. Before the solution can be automated, someone needs to figure out how to solve it. These sorts of problems are most easily approached with code.

With Code, the answer is always yes!

Code is:

Codeless Problem Code-First Solution

Difficulty tracking changes and auditing work

Code, coupled with version control systems like git, to track what changed, when, by whom, and why.

Code can be logged when run for auditing and monitoring.

No single source of truth

Centralized tools to create a single source of truth for data, dashboards, and models.

Version control to track multiple versions of code separately without creating conflicts.

Difficult to extend and reproduce work

Code enables reproducibility by explicitly recording every step taken.

Open-source code can be deployed on many platforms, and is not dependent on proprietary tools.

Code can be copied, pasted, and modified to address novel problems as circumstances change.

Black box constraints on how you analyze your data and present your insights

Access and combine all your data, and analyze and present it exactly as you need to, in the form of tailored dashboards and reports.

Pull in new methods and build on other open source work without waiting for proprietary features to be added by vendors.

*A summary of how a code-first approach helps tackle codeless challenges*

Objections to Code-First Data Science

When discussing the benefits of a code-first approach within your organization, you may hear some common objections:

To Learn More

If you’d like to learn more about the advantages of code-first data science, and see some real examples in action, watch the free, on-demand webinar Why Your Enterprise Needs Code-First Data Science. Or, you can set up a meeting directly with our Customer Success team, to get your questions answered and learn how RStudio can help you get the most out of your data science.

Schedule a Conversation Watch the full code-first webinar

To leave a comment for the author, please follow the link and comment on their blog: RStudio | Open source & professional software for data science teams on RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.