Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
This blog is intended for researchers, PhD students, MD students and any other students who wish to have a robust and effective reference management setup. The blog has a particular focus on those using R markdown, Bookdown or LaTeX. Parts of the blog can also help setup Zotero for use with Microsoft Word. The blog has been designed to help achieve the following goals:
- Effective citation storage
- Fast and easy citation storage (one-click from Chrome)
- Fast and easy PDF storage using cloud storage
- Immediate, automatic and standardised PDF renaming
- Immediate, automatic and standardised citation key generation
- Effective citation integration with markdown etc.
- Generation of citation keys which work with
LaTeX
andmd
(no non-standard characters) - Ability to lock citation keys so that they don’t update with Zotero updates
- Storage of immediately updated
.bib
files for use withRmd
,Bookdown
andLaTeX
- Automated update of the
.bib
file in RStudio server
- Generation of citation keys which work with
Downloads and Setup
For my current reference management setup I need the following software:
- Zotero
- Zotero comes with 300MB of free storage which allows well over 1000 references to be stored as long as PDFs are stored separately
- From the same download page download the Chrome connector to enable the “save to zotero” function in Google Chrome
- ZotFile
- Zotfile is a Zotero plugin which helps with PDF management, download the
.xpi
file and then open Zotero, go to“Tools → Add-Ons”
and click the little cog in the top right corner and navigate to file to install (Figure 1)
- Zotfile is a Zotero plugin which helps with PDF management, download the
- Better BibTeX
- Better BibTeX is a plugin to help generate citation keys which will be essential for writing articles in
LaTeX
,R Markdown
orBookdown
- If the link doesn’t work go to github and scroll down to the ReadMe to find a link to download the
.xpi
file - The same approach is then used to install the Better BibTeX plugin for zotero (
“Tools → Add-Ons”
)
- Better BibTeX is a plugin to help generate citation keys which will be essential for writing articles in
After downloading Zotero, ZotFile and Better BibTeX create an account on Zotero online.
In addition to the Zotero downloads this guide will focus on an efficient setup for writing with R markdown or Bookdown and assumes that you have access to the following software / accounts:
- Dropbox / Google Drive / other cloud storage service which allows APIs
- It will also be necessary for these to be accessible using
Windows Explorer
orMac Finder
(there are many guides online for syncing Google Drive and Dropbox so that they appear in file explorers)
- It will also be necessary for these to be accessible using
- RStudio (this is not 100% essential but it is far harder to use Rmd without it)
- Packages which will be required for this setup include
rdrop2
(if using dropbox, other packages are available to convert this setup to Google Drive etc.),encryptr
,bookdown
orRmarkdown
,tinytex
and a LaTeX installation (the Bookdown author recommends usingtinytex
which can be installed by the similarly named R package:tinytex::install_tinytex()
)
- Packages which will be required for this setup include
Folder Setup
When using Zotero it is a good ideal to create a folder in which you will store PDFs retrieved from articles. Ultimately it is optional whether or not PDFs are stored but if you have access to cloud storage with a good quota then it can make writing in Rmd etc. much faster as there is no requirement to search online for the original PDF. This folder should be set up in Google Drive, Dropbox or another cloud storage service which can be accessed from your own computer through the file explorer.
A second folder may be useful to store bibliographies which will be generated for specific projects or submissions. Again this folder should be made available in cloud storage.
ZotFile Preferences
To setup Zotero so that retrieved PDFs are automatically stored and renamed in the cloud storage without consuming the Zotero storage quota go to “Tools → ZotFile Preferences”
and on the first tab: General Settings and set the folder and subfolder naming strategy for PDFs. I have set the location of the files to a Custom location and in this case used the path to a Google Drive folder (~\Google Drive\Zotero PDF Library
). ZotFile will also store retrieved PDFs in subfolders to help with finding PDFs at a later date. The current setup I use is to create a subfolder with the first author surname so that all papers authored by one (or more) author with the same name are stored together using the \%a
in the subfolder field (Figure 2). Other alternatives are to store PDFs in subfolders using year (\%y
); journal or publisher (\%w
); or item type (\%T
).
Next the Renaming Rules tab can be configured to provide sensible names to each of the files (this is essential if PDFs are not to be stored as random strings of characters which provide no meaning). In this tab I have set the format to: {%a_}{%y_}{%t}
which provides names for the PDFs in the format of: Fairfield_2019_Gallstone_Disease_and_the_Risk_of_Cardiovascular_Disease.pdf
. I find that this shows author, year and first word of title without needing to expand the file name (Figure 3).
I have not changed any of the default settings in either the Tablet Settings or Advanved Settings tabs apart from removing special characters in the Advanced Settings (this stops things from breaking later).
General Zotero Settings
Zotero has several configurable settings (accessed through: “Edit → Preferences”
) and I have either adopted the defaults or made changes as follows:
General:
- I have ticked the following:
- Automatically attach associated PDFs
- Automatically retrieve metadata for PDFs
- Automatically rename attachments using parent metadata
- Automatically tag items with keywords and subject headings
- All options in Group section
- I have left the following unticked:
- Automatically take snapshots
- Rename linked files
Sync:
- Enter the account details
- Tick sync automatically
- Untick sync full text (if you choose to save PDFs then syncing full text will quickly consume the 300MB quota)
Search:
- Left unchanged
Export:
- Left unchanged
Cite:
- There are several sensible defaults but if there is a new citation style you wish to be able to use in Microsoft Word for example then click “Get additional styles” as there is probably a version that you need already created. You can click the “+” button to add a style from a
.csl
file if you have one already. Finally, if you are desperate for a style that doesn’t already exist then you can select a citation style and clickStyle Editor
and edit the raw.csl
file. - In the
Word Processors
subtab (on the main Cite tab), you can install the Microsoft Word add-in to allow Zotero to work in Microsoft Word.
Advanced:
- I changed nothing on the
General
subtab - In the
Files and Folders
subtab I have selected the path to base directory for attachments - I have not changed the
Shortcuts
subtab - I have not changed the
Feeds
subtab
Better BibTex:
- In this section I have set my Citation Key format to
[auth:lower:alphanum]_[year:alphanum]_[veryshorttitle:lower:alphanum]_[journal:lower:clean:alphanum]
(Figure 4). This generates a citation key for each reference in the format offairfield_2019_gallstones_scientificreports
orharrison_2012_hospital_bmj
. It always takes the first author’s surname, the year, the first word of the title and the journal abbreviation if known. Theclean
andalphanum
arguments to this field are used to remove unwanted punctuation which can cause citation to fail in LaTeX.
Once the settings have been configured if you already had references stored in Zotero and wish to change the citation key for old references select your entire library root (above all folders), select all references, right click and use “Better BibTex → Refresh BibTeX Key” and all of the citation keys should be updated.
Creating a .bib
file
For referencing in a new project, publication or submission it may be helpful to have a dynamic .bib
file that updates with every new publication and can be accessed from any device through cloud storage.
To set up a .bib
file, first find the folder that you wish to create the file from (this should be the folder which contains any citations you will use and ideally not the full library to cut down on unnecessary storage and syncing requirements). Note that the .bib
file will generate a bibliography from any citations stored directly in the folder when using default settings. This prevents use of subfolders which I find particularly helpful for organising citations and I have therefore changed the setting so that folders also show any citations stored in subfolders. To make this change go to “Edit Preferences” and select the “Advanced” tab and at the bottom of the “General” subtab select “Config Editor”. This will bring up a searchable list of configurations (it may show a warning message before this) and search in the search box for “extensions.zotero.recursiveCollections”. Set “Value” to TRUE
and then when you click a folder you should see all of the citations also stored in subfolders.
Right click the folder and select “Export Collection”. A pop-up window will appear at which point select “Keep Updated” and if using RStudio desktop save the file in the directory where you have your Rmd project files. If you are working with RStudio server then save the file in a cloud storage location which will then be accessed from the server. I have a .bib
file stored in Dropbox which I access from RStudio server.
Linking Dropbox and RStudio Server to Access the .bib
File
The following covers linking Dropbox to RStudio server but could be adapted to cover another cloud storage service.
Dropbox provides a token to allow communication between different apps. The rdrop2
package is what I used to create a token to allow this. I actually created the token on RStudio desktop as I couldn’t get the creation to work on the server but this is perfectly ok.
Caution: The token generated by this process could be used to access your Dropbox from anywhere using RStudio if you do not keep it secure. If somebody were to access an unencrypted token then it would be equivalent to handing out your email and password. I therefore used the encryptr
package to allow safe storage of this token.
Token Creation
Open Rstudio desktop and enter the following code:
library(rdrop2) library(encryptr) # Create token token <- drop_auth() # Save token saveRDS(token, "droptoken.rds") # Encrypt token genkeys() # Default file names are id_rsa and id_rsa.pub encrypt_file("droptoken.rds", "droptoken.rds.encryptr.bin") encrypt_file(".httr-oauth", ".httr-oauth.encryptr.bin") # Same details should appear later drop_acc() # Remove token from local environment rm(token) # Delete the unencrypted files system("rm droptoken.rds") system("rm .httr-oauth")
The code will create two files, a token and the .httr-oauth
file from which a token can also be made. The encryptr package can then encrypt the files using a public / private key pair. It is essential that the password that is set when using genkeys()
is remembered otherwise the token cannot then be used. In this case the original token can’t be retrieved but could be created again from scratch.
The following files will then be needed to upload to the RStudio server:
droptoken.rds.encryptr.bin
– or the name provided for the encrypted Dropbox tokenid_rsa
– or the name provided for the private key from the private / public key pair
Dropbox Linkage for Referencing the .bib
File
Now that the encrypted token and necessary (password-protected) private key are available in RStudio server, the following can be saved as a separate script. The script is designed to read in and decrypt the encrypted token (this will require a password and should be done if the .bib
file needs updated). Only the drop_download()
needs repeated if using the token again during the same session. The token should be cleared at the end of every session for additional security.
library(rdrop2) library(encryptr) # ******** WARNING ******** # Losing the unencrypted token will give anyone # complete control of your Dropbox account # If you are concerned this has happened, # you can then revoke the rdrop2 app from your # dropbox account and start over. # ******** WARNING ******** safely_extract_dropbox_token <- function(encrypted_db_token = NULL, private_key_file = NULL){ decrypt_file(encrypted_db_token, file_name = "temporary_dropbox_token.rds", private_key_path = private_key_file) token <<- readRDS("temporary_dropbox_token.rds") system("rm temporary_dropbox_token.rds") } safely_extract_dropbox_token(encrypted_db_token = "droptoken.rds.encryptr.bin", private_key_file = "id_rsa") # Then pass the token to each drop_ function drop_acc(dtoken = token) # The path is the Dropbox file location drop_download(path = "My_Dropbox_Home_Directory/Zotero Library/my.bib", local_path = "my.bib", dtoken = token, overwrite = TRUE)
Now that the .bib
file has been created and is stored as “my.bib” in the local directory, it should update whenever the token is loaded and drop_download()
is run.
Final Result
On clicking “Save to Zotero” button in Chrome and running drop_download()
the following should all happen almost instantaneously:
- Zotero stores a new reference
- A PDF is stored in the cloud storage having been named appropriately
- A link to the PDF is stored in Zotero (without using up significant memory)
- A citation key is established for the reference in a standardised format without conflicts
- Pre-existing citation keys which have been referenced earlier in the writing of the paper are not altered
- A
.bib
file is updated in the RStudio server directory - And much unwanted frustration of reference management is resolved
This is my current reference management system which I have so far found to be very effective. If there are ways you think it can be improved I would love to hear about them.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.