Using Azure as an R datasource Part 3: Pulling data from MySQL/MariaDB to Linux
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
by Gregory Vandenbrouck
Software Engineer, Microsoft
This post is the third in a series that covers pulling data from Microsoft SQL Server or MySQL/MariaDB on Azure to an R client on Windows or Linux.
In the previous two, we covered pulling data from SQL Server and MySQL/MariaDB to an R client on Windows.
This time we’ll be pulling data from MySQL/MariaDB to an R client on Linux.
Setting up the Linux client machine
To make things a little more interesting, we’ll have both the machine running MySQL/MariaDB and the client machine in Azure (but different ones).
Creating the client VM
- Log on the Azure Management site with your Azure subscription (see first post),
- Create a VM: select “New”, “Compute”, “Virtual Machine”, “Quick Create” and then fill-in the information. We picked one of the Ubuntu images.
- Make sure to create the client VM in the same region/affinity and same subscription as the machine hosting the MySQL/MariaDB server, otherwise accessing the database is going to be slow and/or problematic.
We ended up creating machine MyAzureUbuntu.cloudapp.net
with account azureuser
.
Accessing the client VM
Verify that ssh (TCP port 22) is one of the endpoints. To access the client VM from your local machine, you can type ssh [email protected]
from a command prompt if you have an ssh client installed.
If your local machine runs Windows, you can use PuTTY or MobaXterm instead.
Configuring the client VM
Once connected to the client VM via ssh:
- To install R and the tools/libraries to connect to MySQL/MariaDB, type:
sudo apt-get install r-cran-rodbc unixodbc unixodbc-bin odbcinst libmyodbc mariadb-client r-cran-rmysql
Note: if not using Unbutu, you may have to use an alternative to apt-get
, such as yum, zypper, etc.
- You can instead access the entire desktop environment by using vnc or xrdp (out of scope for this blog post). Just make sure to add endpoints for the corresponding ports (in the case of xrdp: TCP port 3389).
Creating the test database
We’ll be using the same MariaDB server as in the previous post, but this time we’ll create a sample database and table to make it a little more interesting.
Connecting to the server
We’re connecting to the server from the client Ubuntu machine. In addition to creating the database and table, this is useful for troubleshooting, for example to solve firewall, port and credential issues.
From a command prompt on the client VM, connect to the server:
azureuser@MyAzureUbuntu:~$ mysql -hMyServer.cloudapp.net -uMyUser -pMyPassword
(replace MyServer, MyUser and MyPassword with your values)
Creating the sample table
Once successfully connected to the server, run the following script in the MySQL client:
create database MyDatabase; use MyDatabase; create table MyTable (x float not null, y float not null); drop procedure if exists PopulateMyTable; delimiter $$ create procedure PopulateMyTable() begin declare i int default 0; while i < 500 do insert into MyTable(x, y) values(rand(), rand()); set i = i+1; end while; end $$ delimiter ; call PopulateMyTable(); select count(*) from MyTable;
The last select statement should return 500, confirming we have created a table called MyTable
consisting of 500 rows of 2 columns (x and y) with random values.
Connecting to the database from R on Linux
Using RODBC’s odbcDriverConnect
function
MySQL’s ODBC drivers need to be registered in order to connect using RODBC. The following shows how to do it and how to see the contents of the configuration file.
azureuser@MyAzureUbuntu:~$ cd /etc azureuser@MyAzureUbuntu:/etc$ cat odbcinst.ini azureuser@MyAzureUbuntu:/etc$ sudo odbcinst -i -d -f /usr/share/libmyodbc/odbcinst.ini odbcinst: Driver installed. Usage count increased to 1. Target directory is /etc azureuser@MyAzureUbuntu:/etc$ cat odbcinst.ini [MySQL] Description=MySQL driver Driver=libmyodbc.so Setup=libodbcmyS.so CPTimeout= CPReuse= UsageCount=1
Once installed, our table can be queried from R with an ODBC connection string using RODBC:
library(RODBC) myServer <- "MyServer.cloudapp.net" myUser <- "MyUser" myPassword <- "MyPassword" myDatabase <- "MyDatabase" myDriver <- "MySQL" # same as the name in the [] in the odbcinst.ini file connectionString <- paste0( "Driver=", myDriver, ";Server=", myServer, ";Database=", myDatabase, ";Uid=", myUser, ";Pwd=", myPassword) conn <- odbcDriverConnect(connectionString) ds <- sqlQuery(conn, "SELECT * FROM MyTable") close(conn) plot(ds, col = "red", pch = 19, main = "Querying MySQL with RODBC odbcDriverConnect")
Using RODBC’s odbcConnect
function
odbc.ini is the file used to store DSNs. If you have isql
installed, you can use it to test the connection. Below we show both the contents of the file and a test using isql
:
azureuser@MyAzureUbuntu:/etc$ cat odbc.ini [Azure-MySQL] Description = Testing MySQL Azure Trace = Off TraceFile = stderr Driver = MySQL SERVER = MyServer.cloudapp.net USER = MyUser PASSWORD = MyPassword DATABASE = MyDatabase azureuser@MyAzureUbuntu:/etc$ echo select 1+1 | isql Azure-MySQL -b +---------------------+ | 1+1 | +---------------------+ | 2 | +---------------------+ SQLRowCount returns 1 1 rows fetched
Setting up a DSN greatly simplifies the R code:
library(RODBC) conn <- odbcConnect("Azure-MySQL") ds <- sqlQuery(conn, "SELECT * FROM MyTable") close(conn) hist(ds$x + ds$y, col = "blue", main = "Querying MySQL with RODBC odbcConnect")
Using RJDBC
To use RJDBC to connect to MySQL or MariaDB, you need to:
- From a command prompt on the client VM:
- Install Java:
sudo apt-get install openjdk-8-jdk libmysql-java sqlline
- (Optional) Test the connection outside of R:
echo select count(*) from MyTable | sqlline -u "jdbc:mysql://MyServer.cloudapp.net/MyDatabase" -n MyUser -p MyPassword
- Register Java in R:
sudo R CMD javareconf
- Install Java:
- Install the R package from an R terminal on the client VM (no Ubuntu packages via apt-get are provided):
install.packages("RJDBC")
In R:
In addition to the Url connector prefix, we also need to specify in the call to JDBC
:
- classPath: the location of the jar file. Running
locate *mysql*.jar
from a command prompt may help here. If you don’t get any results, runsudo updatedb
first. - driverClass: the class name. If the one we use below doesn’t work, check the corresponding JDBC driver documentation.
In our particular setup:
library(RJDBC) ## Loading required package: DBI ## Loading required package: rJava drv <- JDBC( driverClass = "com.mysql.jdbc.Driver", # check driver's documentation name classPath = "/usr/share/java/mysql-connector-java.jar", # use locate and updatedb to find location identifier.quote="`") conn <- dbConnect(drv, "jdbc:mysql://MyServer.cloudapp.net/MyDatabase", "MyUser", "MyPassword") ds <- dbGetQuery(conn, "select * from MyTable") dbDisconnect(conn) ## [1] TRUE plot(ds, col=rgb(200,0,0,50, maxColorValue=255), main = "Querying MySQL with RJDBC", pch = 19, cex = 3)
Using RMySQL
Much easier to use than RJDBC (no url connector, driverClass and classpath to figure out):
library(RMySQL) conn <- dbConnect( RMySQL::MySQL(), host = "MyServer.cloudapp.net", user = "MyUser", password = "MyPassword", dbname = "MyDatabase") ds <- dbGetQuery(conn, "select * from MyTable") dbDisconnect(conn) plot(ds, col = "green", main = "Querying MySQL with RODBC RMySQL")
Summary
- We’ve tried RODBC, RJDBC and RMySQL to connect to a MariaDB database hosted on a Suse VM in Azure from a Linux Ubuntu client, also hosted in a VM on Azure.
- With respect to both ease of use and setup RMySQL & RODBC compare favorably to RJDBC: they require less code and don’t need to specify low level parameters.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.