R doesn’t need to throttle AWS Athena anymore
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
RBloggers|RBloggers-feedburner
I am happy to announce that RAthena-1.9.0
and noctua-1.7.0
have been released onto the cran. They both bring two key features:
- More stability when working with
AWS Athena
, focusing onAWS
Rate Exceeded
throttling errors - New helper function to convert
AWS S3
backend files to save cost
NOTE: RAthena
and noctua
features correspond to each other, as a result I will refer to them interchangeability.
Stability
Throttling AWS
One of the main problems when working with AWS
API is stumbling into Rate Exceeded
throttling error. With the latest update to the packages, the connection between AWS Athena
and R
has been made more robust through retry functionality. This allows R
to automatically retry its request using an exponential backoff with jitter (Best practices for API packages)
R
sends a call toAWS Athena
, let’s saydbListTables(con)
. HoweverAWS
is over run with requests, and returns an error back toR
saying it is overwhelmed (this is arate exceeded
throttling error). AsRAthena
andnoctua
retry noisely, the error will be printed to the console letting you knowAWS
is busy ({expection message} + "Request failed. Retrying in " + {wait time} + " seconds..."
).R
will then wait for a given time (please see error format above) and retry the request again.AWS
replies it is still busy and can’t do the request.- This time
R
will back off for a long period of time, this givesAWS
some breathing room. Now whenR
sends the request over toAWS
,AWS
is able to complete the call and return out desired results.
This feature is a great step in the right direction for making R
and AWS Athena
work together seamlessly. For anyone who wishes to create their own retry method both packages have enabled this through their ..._options()
function. For more information please refer to link.
Save the pennies
Converting AWS S3
files
AWS Athena
costs by the amount of data it scans. This makes it very important to have your AWS S3
backend files in the suitable format to reduce the cost of using AWS Athena
. This is where the next key feature comes in. This feature basically creates a simple wrapper to allow you to convert AWS S3
files into a more suitable format.
library(DBI) library(RAthena) con <- dbConnect(athena()) # Upload iris data.frame to AWS Athena as a delimited file dbWriteTable(con, "iris_delim", iris) # Convert to parquet using AWS Athena dbConvertTable(con, obj = "iris_delim", name = "iris_parquet", file.type = "parquet")
In this example simply uploaded iris data.frame
to AWS Athena
in a default delimited file format (please see link for more information around how to upload data to AWS Athena
). Then it is converted into parquet
file format using AWS Athena
. This wrapper isn’t limited to converting just AWS Athena tables, it can also convert SQL DML
queries. Please refer to dbConvertTable
for more documentation or to the dbConvertTable
vignette link.
Finally for more informations around best practises with AWS Athena
please look at Top 10 Perfromace Tuning Tips for Amazon Athena
Sum Up
These two new features bring R
and AWS Athena
that little bit closer together. As always if you have any new features or identify any bugs please feel free to raise a pull request or ticket on the corresponding package github pages (RAthena
and nocuta
)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.