Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Series of Azure Databricks posts:
- Dec 01: What is Azure Databricks
- Dec 02: How to get started with Azure Databricks
- Dec 03: Getting to know the workspace and Azure Databricks platform
- Dec 04: Creating your first Azure Databricks cluster
- Dec 05: Understanding Azure Databricks cluster architecture, workers, drivers and jobs
- Dec 06: Importing and storing data to Azure Databricks
- Dec 07: Starting with Databricks notebooks and loading data to DBFS
- Dec 08: Using Databricks CLI and DBFS CLI for file upload
- Dec 09: Connect to Azure Blob storage using Notebooks in Azure Databricks
- Dec 10: Using Azure Databricks Notebooks with SQL for Data engineering tasks
- Dec 11: Using Azure Databricks Notebooks with R Language for data analytics
- Dec 12: Using Azure Databricks Notebooks with Python Language for data analytics
- Dec 13: Using Python Databricks Koalas with Azure Databricks
- Dec 14: From configuration to execution of Databricks jobs
- Dec 15: Databricks Spark UI, Event Logs, Driver logs and Metrics
- Dec 16: Databricks experiments, models and MLFlow
- Dec 17: End-to-End Machine learning project in Azure Databricks
- Dec 18: Using Azure Data Factory with Azure Databricks
- Dec 19: Using Azure Data Factory with Azure Databricks for merging CSV files
- Dec 20: Orchestrating multiple notebooks with Azure Databricks
- Dec 21: Using Scala with Spark Core API in Azure Databricks
- Dec 22: Using Spark SQL and DataFrames in Azure Databricks
- Dec 23: Using Spark Streaming in Azure Databricks
- Dec 24: Using Spark MLlib for Machine Learning in Azure Databricks
- Dec 25: Using Spark GraphFrames in Azure Databricks
- Dec 26: Connecting Azure Machine Learning Services Workspace and Azure Databricks
- Dec 27: Connecting Azure Databricks with on premise environment
Yesterday we looked into bringing the capabilities of Databricks closer to your client machine. And making that coding, data wrangling and data science little bit more convenient.
Today we will look into deploying Databricks workspace using Powershell.
You will need nothing CLI, Powershell and all that you already have. So, let’s go into CLI and get the Azure Powershell Module.
In CLI type:
if ($PSVersionTable.PSEdition -eq 'Desktop' -and (Get-Module -Name AzureRM -ListAvailable)) { Write-Warning -Message ('Az module not installed. Having both the AzureRM and ' + 'Az modules installed at the same time is not supported.') } else { Install-Module -Name Az -AllowClobber -Scope CurrentUser }
After that, you can connect to your Azure subscription:
Connect-AzAccount
You will be prompted to add your credentials. And once you enter them, you will get the results on your Account, tenantID, Environment and Subscription Name.
Once connected, we will look into Databricks module. To list all the modules:
Get-Module -ListAvailable
To explore what is available for Az.Databricks, lets see with the following PS command:
Get-Command -Module Az.Databricks
Now we can create a new Workspace. In this manner, you can also create “semi” automation, but ARM will make this next steps even easier.
New-AzDatabricksWorkspace ` -Name databricks-test ` -ResourceGroupName testgroup ` -Location eastus ` -ManagedResourceGroupName databricks-group ` -Sku standard
Or we can use ARM (Azure Resource Manager) deployment:
$templateFile = "/users/template.json" New-AzResourceGroupDeployment ` -Name blanktemplate ` -ResourceGroupName myResourceGroup ` -TemplateFile $templateFile
Or you can go through Deployment process in Azure Portal:
And select a Github template to create a new Azure Databricks workspace:
Or you can go under “Build your own template” and get my Github Repository IaC folder with template.json and Parameters.json files and paste the content in here.
Add First the new resource group:
New-AzResourceGroup -Name RG_123xyz -Location “westeurope”
And at the end generate the JSON files for your automated deployment. Adding with parameters file:
$templateFile = "/users/tomazkastrun/Documents/GitHub/Azure-Databricks/iac/template.json" PS /Users/tomazkastrun> $parameterFile=“/users/tomazkastrun/Documents/GitHub/Azure-Databricks/iac/parameters.json” New-AzResourceGroupDeployment -Name DataBricksDeployment -ResourceGroupName RG_123xyz -TemplateFile $templateFile -TemplateParameterFile $parameterFile
This will take some time:
But you can always check the happening in Azure Portal:
And you can see the deployment status: 1 Deploying. And once you are finished, you will have the PowerShell returning you the status:
These values will be same as the one in parameters.JSON file. In this manner you can automate your deployment and continuous integration (CI) and continuous deployment (CD).
Tomorrow we will dig into Apache Spark.
Complete set of code and the Notebook is available at the Github repository.
Happy Coding and Stay Healthy!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.