Site icon R-bloggers

Advent of 2020, Day 28 – Infrastructure as Code and how to automate, script and deploy Azure Databricks with Powershell

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Series of Azure Databricks posts:

Yesterday we looked into bringing the capabilities of Databricks closer to your client machine. And making that coding, data wrangling and data science little bit more convenient.

Today we will look into deploying Databricks workspace using Powershell.

You will need nothing CLI, Powershell and all that you already have. So, let’s go into CLI and get the Azure Powershell Module.

In CLI type:

if ($PSVersionTable.PSEdition -eq 'Desktop' -and (Get-Module -Name AzureRM -ListAvailable)) {
    Write-Warning -Message ('Az module not installed. Having both the AzureRM and ' +
      'Az modules installed at the same time is not supported.')
} else {
    Install-Module -Name Az -AllowClobber -Scope CurrentUser
}

After that, you can connect to your Azure subscription:

Connect-AzAccount

You will be prompted to add your credentials. And once you enter them, you will get the results on your Account, tenantID, Environment and Subscription Name.

Once connected, we will look into Databricks module. To list all the modules:

Get-Module -ListAvailable

To explore what is available for Az.Databricks, lets see with the following PS command:

Get-Command -Module Az.Databricks

Now we can create a new Workspace. In this manner, you can also create “semi” automation, but ARM will make this next steps even easier.

New-AzDatabricksWorkspace  `
   -Name databricks-test  `
   -ResourceGroupName testgroup  `
   -Location eastus  `
   -ManagedResourceGroupName databricks-group  `
   -Sku standard

Or we can use ARM (Azure Resource Manager) deployment:

$templateFile = "/users/template.json"
New-AzResourceGroupDeployment `
  -Name blanktemplate `
  -ResourceGroupName myResourceGroup `
  -TemplateFile $templateFile

Or you can go through Deployment process in Azure Portal:

And select a Github template to create a new Azure Databricks workspace:

Or you can go under “Build your own template” and get my Github Repository IaC folder with template.json and Parameters.json files and paste the content in here.

Add First the new resource group:

New-AzResourceGroup -Name RG_123xyz  -Location “westeurope”

And at the end generate the JSON files for your automated deployment. Adding with parameters file:

$templateFile = "/users/tomazkastrun/Documents/GitHub/Azure-Databricks/iac/template.json"
PS /Users/tomazkastrun> $parameterFile=“/users/tomazkastrun/Documents/GitHub/Azure-Databricks/iac/parameters.json”
New-AzResourceGroupDeployment -Name DataBricksDeployment -ResourceGroupName RG_123xyz -TemplateFile $templateFile -TemplateParameterFile $parameterFile

This will take some time:

But you can always check the happening in Azure Portal:

And you can see the deployment status: 1 Deploying. And once you are finished, you will have the PowerShell returning you the status:

These values will be same as the one in parameters.JSON file. In this manner you can automate your deployment and continuous integration (CI) and continuous deployment (CD).

Tomorrow we will dig into Apache Spark.

Complete set of code and the Notebook is available at the Github repository.

Happy Coding and Stay Healthy!

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.