AzureStor: an R package for working with Azure storage
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Hong Ooi, senior data scientist, Microsoft Azure
A few weeks ago, I introduced the AzureR family of packages for working with Azure in R. Since then, I’ve also written articles on how to use AzureRMR to interact with Azure Resource Manager, how to use AzureVM to manage virtual machines, and how to use AzureContainers to deploy R functions with Azure Kubernetes Service. This article is the next in the series, and covers AzureStor: an interface to Azure storage.
The Resource Manager interface: creating and deleting storage accounts
AzureStor implements an interface to Azure Resource Manager, which you can use manage storage accounts: creating them, retrieving them, deleting them, and so forth. This is done via the appropriate methods of the az_resource_group
class. For example, the following code shows how you might create a new storage account from scratch.
library(AzureStor) # get the resource group for the storage account rg <- AzureRMR::az_rm$ new(tenant="{tenant_id}", app="{app_id}", password="{password}")$ get_subscription("{subscription_id}")$ get_resource_group("myresourcegroup") # create the storage account # by default, this will be in the resource group's region rg$create_storage_account("mynewstorage")
Without any options, this will create a storage account with the following parameters:
- General purpose account (all storage types supported)
- Locally redundant storage (LRS) replication
- Hot access tier (for blob storage)
- HTTPS connection required for access
You can change these by setting the arguments to create_storage_account()
. For example, to create an account with geo-redundant storage replication and the default blob access tier set to “cool”:
rg$create_storage_account("myotherstorage", replication="Standard_GRS", access_tier="cool")
To retrieve an existing storage account, use the get_storage_account()
method. Only the storage account name is required.
# retrieve one of the accounts created above stor2 <- rg$get_storage_account("myotherstorage")
Finally, to delete a storage account, you simply call its delete()
method. Alternatively, you can call the delete_storage_account()
method of the az_resource_group
class, which will do the same thing. In both cases, AzureStor will prompt you for confirmation that you really want to delete the storage account.
rg$delete_storage_account("mynewstorage") stor2$delete() # if you have the storage account object
The client interface: working with storage
Storage endpoints
Perhaps the more relevant part of AzureStor for most users is its client interface to storage. With this, you can upload and download files and blobs, create containers and shares, list files, and so on. Unlike the ARM interface, the client interface uses S3 classes. This is for a couple of reasons: it is more familiar to most R users, and it is consistent with most other data manipulation packages in R, in particular the tidyverse.
The starting point for client access is the storage_endpoint
object, which stores information about the endpoint of a storage account: the URL that you use to access storage, along with any authentication information needed. The easiest way to obtain an endpoint object is via the storage account resource object’s get_blob_endpoint()
and get_file_endpoint()
methods:
# get the storage account object stor <- AzureRMR::az_rm$ new(tenant="{tenant_id}", app="{app_id}", password="{password}")$ get_subscription("{subscription_id}")$ get_resource_group("myresourcegroup")$ get_storage_account("mynewstorage") stor$get_blob_endpoint() # Azure blob storage endpoint # URL: https://mynewstorage.blob.core.windows.net/ # Access key: <hidden> # Account shared access signature: <none supplied> # Storage API version: 2018-03-28 stor$get_file_endpoint() # Azure file storage endpoint # URL: https://mynewstorage.file.core.windows.net/ # Access key: <hidden> # Account shared access signature: <none supplied> # Storage API version: 2018-03-28
This shows that the base URL to access blob storage is https://mynewstorage.blob.core.windows.net/, while that for file storage is https://mynewstorage.file.core.windows.net/. While it’s not displayed, the endpoint objects also include the access key necessary for authenticated access to storage; this is obtained directly from the storage account resource.
More practically, you will usually want to work with a storage endpoint without having to go through the process of authenticating with Azure Resource Manager (ARM). Often, you may not have any ARM credentials to start with. In this case, you can create the endpoint object directly with blob_endpoint()
and file_endpoint()
:
# same as above blob_endp <- blob_endpoint( "https://mynewstorage.blob.core.windows.net/", key="mystorageaccesskey") file_endp <- file_endpoint( "https://mynewstorage.file.core.windows.net/", key="mystorageaccesskey")
Notice that when creating the endpoint this way, you have to provide the access key explicitly.
Instead of an access key, you can provide a shared access signature (SAS) to gain authenticated access. The main difference between using a key and a SAS is that the former unlocks access to the entire storage account. A user who has a key can access all containers and files, and can read, modify and delete data without restriction. On the other hand, a user with a SAS can be limited to have access only to specific files, or be limited to read access, or only for a given span of time, and so on. This is usually much better in terms of security.
Usually, the SAS will be given to you by your system administrator. However, if you have the storage acccount resource object, you can generate and use a SAS as follows. Note that generating a SAS requires the storage account’s access key.
# shared access signature: read/write access, container+object access, valid for 12 hours now <- Sys.time() sas <- stor$get_account_sas(permissions="rw", resource_types="co", start=now, end=now + 12 * 60 * 60, key=stor$list_keys()[1]) # create an endpoint object with a SAS, but without an access key blob_endp <- stor$get_blob_endpoint(sas=sas)
If you don’t have a key or a SAS, you will only have access to unauthenticated (public) containers and file shares.
The AzureStor package is available now on Github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.