Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In this article we present our R package rsync, which serves as an interface between R and the popular Linux command line tool rsync.
Originally rsync is an open source tool for efficiently synchronizing files. Published by Paul Mackerras and Andrew Tridgell under the GNU General Public License, it allows users of Unix systems to synchronize local and remote files between two locations. Apart from copying files, a big advantage of the rsync algorithm is to only transfer differences within the files, if possible. This allows high speeds and avoids redundancies.
Rsync enjoys great popularity, e.g. when creating backups as well as transferring or mirroring data. The tool supports the synchronization of links, rights, groups and devices.
Figure 1 visualizes the usage scenarios of rsync. We distinguish two cases. First, the case between two local directories (top half of the graph) and second, the case between a local directory and a remote rsync daemon (bottom half of the graph).
Rsync is only available for Unix systems and is controlled from the command line. Unfortunately we have to put Windows users at ease here. Only via a small detour, e.g. with cygwin, Windows users can work around this issue.
An example command to synchronize the x.csv file between the local directory and an rsync daemon might look like this:
<code>RSYNC_PASSWORD="1234" rsync -ltx /home/UserXY/Documents/x.csv rsync://user@example.de/someFolder"</code>
Since we implement many processes in R, we thought that an R package could take some work from us here. The R package handles the communication with the command line program rsync. The same command in R now looks like this:
<code>rsync::sendFile(con, fileName = 'x.csv')</code>
The R interface to rsync has several advantages:
- Intuitive operation for users with R experience
- Interaction with rsync directly from within R
- Typical actions can be performed without command line skills
Installation
To make it easier for our R package to synchronize, the underlying rsync tool must be installed first. On most systems it is part of the installation already.
Then install our rsync package by running the following line in R:
<code># install.packages("devtools") devtools::install_github("INWTlab/rsync")</code>
Functionality
At the beginning of an R session, a connection between two endpoints has to be established. To have a reproducible example, we demonstrate the features with two local folders. You can create a rsync configuration using:
<code>library("rsync") dir.create("destination") dir.create("source") dest <- rsync(dest = "destination", src = "source") dest</code> <code class="output">## Rsync server: ## dest: /home/userxy/projects/rsync/destination ## src: /home/userxy/projects/rsync/source ## password: NULL ## Directory in destination: ## [1] name lastModified size ## <0 rows> (or 0-length row.names)</code>
In the case of an rsync daemon you can also supply a password. The way you think about transactions is that we have a destination folder with which we want to interact. All methods provided by this package will always operate on the destination. It will not change the source, in most cases. An exception is sendObject, that will also create a file in source.
<code>x <- 1 y <- 2 sendObject(dest, x) sendObject(dest, y) </code> <code class="output">## Rsync server: ## dest: /home/userxy/projects/rsync/destination ## src: /home/userxy/projects/rsync/source ## password: NULL ## Directory in destination: ## name lastModified size ## 1 x.Rdata 2019-02-28 12:30:51 61 ## 2 y.Rdata 2019-02-28 12:30:51 60 </code>
We can see that we have added two new files. These two files now exist in the source directory and the destination. The following examples illustrate the core features of the package:
<code>removeAllFiles(dest) # will not change source</code> <code class="output">## Rsync server: ## dest: /home/userxy/projects/rsync/destination ## src: /home/userxy/projects/rsync/source ## password: NULL ## Directory in destination: ## [1] name lastModified size ## <0 rows> (or 0-length row.names)</code> <code>sendFile(dest, "x.Rdata") # so we can still send the files</code> <code class="output">## Rsync server: ## dest: /home/userxy/projects/rsync/destination ## src: /home/userxy/projects/rsync/source ## password: NULL ## Directory in destination: ## name lastModified size ## 1 x.Rdata 2019-02-28 12:30:51 61</code>
loadData() also retrieves files from con$from, except that the information is loaded directly into the R workspace. dataName is the equivalent of entryName of the previous case and allows loading files of the following formats: .Rdata, .csv, .json.
For more information please check out the README and documentation of the the rsync package.
Feedback
You are welcome to help improve this R package, just create an issue on Github.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.