Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Introduction
This is the first post in the series R & Databases. You can find the links to the other two posts of this series below:
In this post, we will learn to:
- connect to a SQLite database from R
- display database information
- list tables in the database
- query data
- read entire table
- read few rows
- read data in batches
- create table in database
- overwrite table in database
- append data to table in database
- remove table from database
- generate SQL query
- close database connection
Libraries, Code & Data
We will use the following libraries in this post:
All the data sets used in this post can be found here and code can be downloaded from here.
Connection
The first step is to connect to a database. In this post, we will connect to an in memory
SQLite databse using dbConnect()
.
con <- dbConnect(RSQLite::SQLite(), ":memory:")
Connection Summary
We can get the more information about the connection using summary()
.
summary(con) ## Length Class Mode ## 1 SQLiteConnection S4
List Tables
Now that we are connected to a database, let us list all the tables present in it using
dbListTables()
.
dbListTables(con) ## [1] "ecom" "sqlite_stat1" "sqlite_stat4"
List Fields
Time to explore the ecom
table in the database. Use dbListFields()
to list all
the fields in the table.
dbListFields(con, "ecom") ## [1] "referrer" "device" "bouncers" "n_visit" "n_pages" "duration"
Querying Data
The main objectives of connecting to a database are to:
- query data from the tables already present
- create new tables
- overwrite existing tables
- delete existing tables
Let us begin with querying data. We can do this in the following ways:
- read an entire table at once
- read few rows from a table
- read data in batches
Entire Table
We can read an entire table from a database using dbReadTable()
.
dbReadTable(con, 'ecom') ## referrer device bouncers n_visit n_pages duration ## 1 google laptop 1 10 1 693 ## 2 yahoo tablet 1 9 1 459 ## 3 direct laptop 1 0 1 996 ## 4 bing tablet 0 3 18 468 ## 5 yahoo mobile 1 9 1 955 ## 6 yahoo laptop 0 5 5 135 ## 7 yahoo mobile 1 10 1 75 ## 8 direct mobile 1 10 1 908 ## 9 bing mobile 0 3 19 209 ## 10 google mobile 1 6 1 208 ## 11 direct laptop 1 9 1 738 ## 12 direct tablet 0 6 12 132 ## 13 direct mobile 0 9 14 406 ## 14 yahoo tablet 0 5 8 80 ## 15 yahoo mobile 0 7 1 19 ## 16 bing laptop 1 1 1 995 ## 17 bing tablet 0 5 16 368 ## 18 google tablet 1 7 1 406 ## 19 social tablet 0 7 10 290 ## 20 social tablet 0 2 1 28
In some cases, we may not need the entire table but only a specific number of rows.
Use dbGetQuery()
and supply a SQL statement specifying the number of rows of
data to be read from the table. In the below example, we read ten rows of data from
the ecom
table.
Few Rows
dbGetQuery(con, "select * from ecom limit 10") ## referrer device bouncers n_visit n_pages duration ## 1 google laptop 1 10 1 693 ## 2 yahoo tablet 1 9 1 459 ## 3 direct laptop 1 0 1 996 ## 4 bing tablet 0 3 18 468 ## 5 yahoo mobile 1 9 1 955 ## 6 yahoo laptop 0 5 5 135 ## 7 yahoo mobile 1 10 1 75 ## 8 direct mobile 1 10 1 908 ## 9 bing mobile 0 3 19 209 ## 10 google mobile 1 6 1 208
In case of very large table, we can read data in batches using dbSendQuery()
and
dbFetch()
. We can mention the number of rows of data to be read while fetching the
data using the query generated by dbGetQuery()
.
Read Data in Batches
query <- dbSendQuery(con, 'select * from ecom') result <- dbFetch(query, n = 15) result ## referrer device bouncers n_visit n_pages duration ## 1 google laptop 1 10 1 693 ## 2 yahoo tablet 1 9 1 459 ## 3 direct laptop 1 0 1 996 ## 4 bing tablet 0 3 18 468 ## 5 yahoo mobile 1 9 1 955 ## 6 yahoo laptop 0 5 5 135 ## 7 yahoo mobile 1 10 1 75 ## 8 direct mobile 1 10 1 908 ## 9 bing mobile 0 3 19 209 ## 10 google mobile 1 6 1 208 ## 11 direct laptop 1 9 1 738 ## 12 direct tablet 0 6 12 132 ## 13 direct mobile 0 9 14 406 ## 14 yahoo tablet 0 5 8 80 ## 15 yahoo mobile 0 7 1 19
Query
Query Status
To know the status of a query, use dbHasCompleted()
. It is very useful in
cases of queries that might take a long time to complete.
dbHasCompleted(query) ## [1] FALSE
Query Info
dbGetInfo()
will return the following:
- the sql staement
- number of rows fetched
- number of rows modified/affected
- status of the query
dbGetInfo(query) ## $statement ## [1] "select * from ecom" ## ## $row.count ## [1] 15 ## ## $rows.affected ## [1] 0 ## ## $has.completed ## [1] FALSE
Latest Query
To get the latest query, use dbGetStatement()
.
dbGetStatement(query) ## [1] "select * from ecom"
Rows Fetched
To check the number of rows of data returned by a query, use dbGetRowCount()
.
dbGetRowCount(query) ## [1] 15
Rows Affected
To know the number of rows modified or affected in the table, use dbGetRowsAffected()
.
dbGetRowsAffected(query) ## [1] 0
Column Info
To know the name of the columns and their data types, use dbColumnInfo()
.
dbColumnInfo(query) ## name type ## 1 referrer character ## 2 device character ## 3 bouncers integer ## 4 n_visit double ## 5 n_pages double ## 6 duration double
Create Table
So far we have explored querying data from an existing table. Now, let us turn our attention to creating new tables in the database.
Introduction
To create a new table, use dbWriteTable()
. It takes the following 3 arguments:
- connection name
- name of the new table
- data for the new table
x <- 1:10 y <- letters[1:10] trial <- tibble::tibble(x, y) dbWriteTable(con, "trial", trial) ## Warning: Closing open result set, pending rows
Let us check if the new table has been created.
dbListTables(con) ## [1] "ecom" "sqlite_stat1" "sqlite_stat4" "trial" dbExistsTable(con, "trial") ## [1] TRUE
Let us query data from the new table.
dbGetQuery(con, "select * from trial limit 5") ## x y ## 1 1 a ## 2 2 b ## 3 3 c ## 4 4 d ## 5 5 e
Overwrite Table
In some cases, you may want to overwrite the data in an existing table. Use the
overwrite
argument in dbWriteTable()
and set it to TRUE
.
x <- sample(100, 10) y <- letters[11:20] trial2 <- tibble::tibble(x, y) dbWriteTable(con, "trial", trial2, overwrite = TRUE)
Let us see if the trial table has been overwritten.
dbGetQuery(con, "select * from trial limit 5") ## x y ## 1 86 k ## 2 31 l ## 3 19 m ## 4 16 n ## 5 99 o
Append Data
You can append data to an existing table by setting the append
argument in
dbWriteTable()
to TRUE
.
x <- sample(100, 10) y <- letters[5:14] trial3 <- tibble::tibble(x, y) dbWriteTable(con, "trial", trial3, append = TRUE)
Let us quickly check if the new data has been appended to the trial table.
dbReadTable(con, "trial") ## x y ## 1 86 k ## 2 31 l ## 3 19 m ## 4 16 n ## 5 99 o ## 6 54 p ## 7 14 q ## 8 7 r ## 9 15 s ## 10 48 t ## 11 76 e ## 12 48 f ## 13 100 g ## 14 88 h ## 15 45 i ## 16 20 j ## 17 56 k ## 18 54 l ## 19 25 m ## 20 11 n
We can also use sqlAppendTable()
to append data to an existing table.
sqlAppendTable(con, "ecom", head(ecom)) ## Warning: Do not rely on the default value of the row.names argument for ## sqlAppendTable(), it will change in the future. ## <SQL> INSERT INTO `ecom` ## (`referrer`, `device`, `bouncers`, `n_visit`, `n_pages`, `duration`) ## VALUES ## ('google', 'laptop', TRUE, 10, 1, 693), ## ('yahoo', 'tablet', TRUE, 9, 1, 459), ## ('direct', 'laptop', TRUE, 0, 1, 996), ## ('bing', 'tablet', FALSE, 3, 18, 468), ## ('yahoo', 'mobile', TRUE, 9, 1, 955), ## ('yahoo', 'laptop', FALSE, 5, 5, 135)
Insert Rows
Introduction
We can insert new rows into existing tables using:
dbExecute()
dbSendStatement()
Both the function take 2 arguments:
- connection name
- sql statement
# use dbExecute dbExecute(con, "INSERT into trial (x, y) VALUES (32, 'c'), (45, 'k'), (61, 'h')" ) ## [1] 3 # use dbSendStatement dbSendStatement(con, "INSERT into trial (x, y) VALUES (25, 'm'), (54, 'l'), (16, 'y')" ) ## <SQLiteResult> ## SQL INSERT into trial (x, y) VALUES (25, 'm'), (54, 'l'), (16, 'y') ## ROWS Fetched: 0 [complete] ## Changed: 3
Remove Table
If you want to delete/remove a table from the database, use dbRemoveTable()
.
dbRemoveTable(con, "trial") ## Warning: Closing open result set, pending rows
SQLite Data Type
If you want to know the data type, use dbDataType()
.
dbDataType(RSQLite::SQLite(), "a") ## [1] "TEXT" dbDataType(RSQLite::SQLite(), 1:5) ## [1] "INTEGER" dbDataType(RSQLite::SQLite(), 1.5) ## [1] "REAL"
Close Connection
It is a good practice to close connection to a database when you no longer need to
read/write data from/to it. Use dbDisconnect()
to close the database connection.
dbDisconnect(con)
References
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.