MySQL Data Type Mapping in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
There was a recent question in the /r/Rlanguage subreddit which piqued my interest. They asked how to find the right mapping, and with the large number of data types I wondered if there was a good way to dynamically discover how fields are cast.
First step is to decide how to communicate with the database. I used the package RMySQL for this.
library(RMySQL)
Now to connect to the DB. I am running this locally. Make sure to use a password, even locally.
db_sql=dbConnect(MySQL(),user="root", host="127.0.0.1", dbname="test", password="", port=3306)
I included two different queries. The one below grabs all of the rows from the table.
q <- "SELECT * FROM main" rows <- dbGetQuery(db_sql, q)
We get a couple warnings with the current table.
Warning messages: 1: In .local(conn, statement, ...) : Unsigned INTEGER in col 0 imported as numeric 2: In .local(conn, statement, ...) : unrecognized MySQL field type 7 in column 5 imported as character
We can use this loop to get the column name and data type for the imported dataframe.
for(column in names(rows)) { print(paste0(column,": ",typeof(rows[[column]]))) }
We get this nice output.
[1] "id: double" [1] "char_test: character" [1] "float_test: double" [1] "bool_test: integer" [1] "double_test: double" [1] "timestamp_test: character" [1] "datetime_test: character" [1] "date_test: character"
The column names in MySQL were the data type followed by _test
. As you can see, they were not all imported with a matching data type.
We can get a 1:1 mapping of the column type in MySQL and imported data type in R with the following piece of code.
q <- "SELECT DATA_TYPE FROM INFORMATION_SCHEMA.COLUMNS WHERE table_name = 'main'" schema <- dbGetQuery(db_sql, q) print(data.frame(SQL = schema$DATA_TYPE, R = as.vector(sapply(rows,typeof))))
This gives us a nice printout of a new dataframe created on the fly.
SQL R 1 int double 2 char character 3 float double 4 tinyint integer 5 double double 6 timestamp character 7 datetime character 8 date character
And of course we need to close the connection if no more queries need to be made.
dbDisconnect(db_sql)
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.