Learning To Create an R Package With Deliberate Redundancy 🤣 A Note For Myself
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
🙈 Made a hilariously redundant R package for a simple OpenAI calls, but the real win was finally learning how to build an R package! 🛠️ Is it efficient? Absolutely not!Was it worth the time and experience? Yes! Will I do it again? Yes! Will it break? Yes! 🤣
Motivations
We’ve been using a lot of LLM lately especially OpenAI GPT4omini via reticulate
. Trying to load reticulate
-> import OpenAI -> initialize it -> insert API_key etc again and again whenever we want to use it for a new project or new script, that’s just very redundant, tedious and so on.
Recently, with motivation from
Alec Wong
kindly recommending me to read
R Package after I asked him a noob question “How does one test and make changes of an existing R package and pull request?” 🤣 The book is great! I really enjoy the first part, it actually teaches you how to create an R package right from the get go! Before it dives into the details. I found that to be extremely refreshing, because you can easily create an MVP right upfront for personal use by following a workflow! 🙌
With the LLM in mind, and wanting to learn how to create an R package (ahem properly, best practice 🫡), why don’t we try to make our very own package called myopenai
!? Alright, there is already a brilliant package called ellmer
, and my current project is most likey redundant (hence the title), but what a better way to start learning? And we’re going to do it the inefficient way, via reticulate
(this is when you see R experts shake their head of disbelief 🤭) instead of the http API! 🤣 Join me in learning how to create an R Package!
Seriously though, this is not an efficient way of using openAI API… It’s for demonstration purposes only.
The Original Code
reticulate::use_virtualenv("openai") library(reticulate) openai <- import("openai") OpenAI <- openai$OpenAI client = OpenAI(api_key = 'YOUR API KEY') ## change this to yours response <- client$chat$completions$create( model = "gpt-4o-mini", messages = list(dict( role = "system", content = "You are a summarizer. Please summarize the following abstract. Include statistics.Use emoji to shorten characters. The summary must not exceed 200 characters—if you reach 200 characters, stop immediately. Do not add any extra commentary or exceed the 200-character limit." ), dict( role = "user", content = df_new[10,"item_description"] |> pull() ) ), temperature = 0 ) (summary <- response$choices[[1]]$message$content)
The above was our previous code. Now we will use the above code, turn it into a function and hopefully at the end of the project we’ll have a library called myopenai
and we can call a function chat()
and just need to insert the prompt, or pipe the prompt into the function.
Objectives:
The Workflow
A quick note for myself:
- Load devtool, create_package()
- use_git(), will likely need restart
- load devtools again, write first function
- use_r(“function”), then insert function there
- load_all()
- check()
- edit DESCRIPTION
- use_mit_license()
- insert roxygen2 skeleton on function, edit
- document()
- since we need other packages, use_package()
- check()
usethis::create_package
usethis::create_package("myopenai")
You might see something like this if you already have a
.git
on the working directory, try to create a package in a new directory that doesn’t have a parent directory with .git
in it and it’ll open to a new project session with basic, something like usethiss::create_package("/Users/you/Document/myopenai")
. You will notice that you have basic files such as.
Read here for what the directories or files represent.
use_git
library(devtools) use_git()
You will receive 2 questions, basically if you want to use git for version control and also if you want to commit. Say yes to both. Then your project session will restart. Note that every time it restarts, you have to reload devtools
Write A Function
Haven’t figured out what is the best way to do this at yet, do we write a minimal functional code somewhere else and then copy it into project or just create an untitlted R script and write a minimal functional code before use_r()
function? Anyway, for now just create an untitled script and start writing before we paste it in a more organized R file.
chat <- function(prompt="how are you?", system="", temp=0, max_tokens = 500L) { ## If virtualenv does not exist, create and install openai if (reticulate::virtualenv_exists(envname = "openai")==FALSE) { reticulate::virtualenv_create(envname = "openai", packages = c("openai")) } ## Start Env reticulate::use_virtualenv("openai") ## Initialize OpenAI OpenAI = reticulate::import("openai")$OpenAI client = OpenAI(api_key = "YOUR API KEY") ## Prompt response = client$chat$completions$create( model = "gpt-4o-mini", messages = list(reticulate::dict( role = "system", content = system ), reticulate::dict( "role" = "user", "content" = prompt )), temperature = temp, max_tokens = max_tokens ) ## Response message = response$choices[[1]]$message$content return(message) }
Alright, the above code is quite similar to our original, but at least now it’s a function and we have some default parameters inserted so when we call chat()
without any parameters in it, it will at least return something. Note above is not the efficient at all, best to communicate with openAI API is still via httr2
, you will have much less dependencies. Also, note that above is again, not to most defensive coding practice either but we can improve from there, including setting your API key in .Rprofile
.
usethis::use_r()
usethis::use_r("chat")
This will create a blank R file called chat.R
in R folder. We then copy our minimal function code into this script. Do not run anything yet! That’s the next step. We don’t necessarily have to create an R script for every function. We should at least name it where it’s machine & human readable, sort-friendly if date used
read here more on Tidy style guide.
About organization? It’s hard, like smart brevity
It’s hard to describe exactly how you should organise your code across multiple files. I think the best rule of thumb is that if you can give a file a concise name that still evokes its contents, you’ve arrived at a good organisation. But getting to that point is hard. -Tidyverse style guide
devtools::load_all()
On the console, enter the following:
devtools::load_all()
This will make the entire package available / interactive. And that’s when we can test our function and debug if there is error. As you can see in the above image, when we call chat()
it returned a response. The book explains the difference between source
and running as a library
,
here
devtools::check()
devtools::check()
The function check()
is essential because it runs comprehensive tests to ensure all components of an R package work correctly, catching issues early when they’re easier to fix. It validates that your package meets CRAN standards, maintaining code quality and preventing downstream problems for users. Regular checking establishes good development habits, especially before sharing your package with others or submitting it to repositories.
The above image showed that we have 2 warnings, let’s fix it!
Edit DESCRIPTION
Before we start to fix the 2 things above, let’s edit our DESCRIPTION.
Feel free to edit title, description, author info
usethis::use_mit_license()
# if you already loaded `devtools` you can just use_mit_license()
Insert roxygen2 skeleton on function
Click on the chat function area, then go to Code > Insert Roxygen2 Skeleton
Then it will look something like
Then you can edit it as such
#' GPT4omini chat convenience #' #' @param system System Prompt #' @param prompt User Prompt #' @param temp Temperature #' @param max_tokens Max Tokens #' #' @return String Response #' @export #' #' @examples #' chat("what is 2+2?") chat <- function(prompt="how are you?", system="", temp=0, max_tokens = 500L) { ## If virtualenv does not exist, create and install openai if (reticulate::virtualenv_exists(envname = "openai")==FALSE) { reticulate::virtualenv_create(envname = "openai", packages = c("openai")) }
devtools::document()
The document()
function converts roxygen2 comments in your R source files into proper R documentation files, specifically creating the .Rd
files needed for help pages and updating the NAMESPACE file. This process transforms human-readable special comments into the formal documentation structure required by R, making your package’s functions discoverable and usable through the standard help system.
and when you do this
?chat
you will see this
usethis::use_package()
usethis::use_package("reticulate")
The use_package()
function formally declares dependencies by adding packages to the Imports
field in your package’s DESCRIPTION file, allowing you to use functions from external packages like reticulate
.
Let’s Run check()
Again and install()
devtools::check()
Yesh. Good sign!!!
devtools::install()
and now you can go to Rstudio and the below code should work!
library(myopenai) "what is 10 + 5?" |> chat() ## [1] "10 + 5 equals 15."
🙌🤘🤩 we did it!!!
Further Improvement Of The Package
Split Files To Chat.R
and utils.R
Chat.R
#' Set you Open API key #' #' @param key your API key #' #' @return set environment #' @export #' #' @examples #' \dontrun{ #' set_api_key("your-api-key-here") #' } set_api_key <- function(key=NULL) { if (is.null(key)) { cli::cli_abort("Need to insert API key on key parameter")} if (!is.null(key)) { key_old <- Sys.getenv("OPENAI_KEY") if (key==key_old) { cli::cli_alert_warning("Your previous API key is the same as the one currently provided") } if (key!=key_old) { set_env(key) cli::cli_alert_success("New API Key Inserted") rstudioapi::restartSession() } if (key_old=="") { set_env(key) cli::cli_alert_success("API Key Inserted") rstudioapi::restartSession() } } } #' GPT4omini chat convenience #' #' @param system System Prompt #' @param prompt User Prompt #' @param temp Temperature #' @param max_tokens Max Tokens #' @param api_key use `set_api_key` to set it in R environment #' #' @return String Response #' @export #' #' @examples #' chat("what is 2+2?") chat <- function(prompt="how are you?", system="", temp=0, max_tokens = 500L, api_key = openai_api_key()) { # check to see if virtualenv is installed if (reticulate::virtualenv_exists(envname = "openai")==FALSE) { reticulate::virtualenv_create(envname = "openai", packages = c("openai")) rstudioapi::restartSession() } # check to see if there is api_key set if (api_key=="") { cli::cli_abort("No API key found. Need to insert API key by using {.code set_api_key()}") } ## Start Env reticulate::use_virtualenv("openai") ## Initialize OpenAI OpenAI = reticulate::import("openai")$OpenAI client = OpenAI(api_key = api_key) ## Prompt response = client$chat$completions$create( model = "gpt-4o-mini", messages = list(reticulate::dict( role = "system", content = system ), reticulate::dict( "role" = "user", "content" = prompt )), temperature = temp, max_tokens = max_tokens ) ## Response message = response$choices[[1]]$message$content return(message) }
utils.R
openai_api_key <- function() { key <- Sys.getenv("OPENAI_KEY") if (key=="") { cli::cli_abort("No API key found. Need to insert API key by using {.code set_api_key()}") } else { return(key) } } set_env <- function(key) { renviron_path <- file.path(Sys.getenv("HOME"), ".Renviron") api_key_entry <- paste0("OPENAI_KEY=",key) if (!file.exists(renviron_path)) { file.create(renviron_path) } # Read existing content existing_content <- readLines(renviron_path, warn = FALSE) if (!any(grepl("^OPENAI_KEY=", existing_content))) { # Append the new entry write(api_key_entry, renviron_path, append = TRUE) } else { # Replace existing entry existing_content[grepl("^OPENAI_KEY=", existing_content)] <- api_key_entry writeLines(existing_content, renviron_path) } }
Read more from the book to get an intuition of why we have chat.R
and utils.R
. Basically, you want functions that get exported to be in chat.R
or similar R file, and supportive functions that are not exported to be in utils.R
, instead of all functions in 1 file, mainly for readability / organization. That’s a really good idea.
There are plenty more things to improve on. We can spend many hours revising this! Let’s move on to testing
Add Unit Testing
use_test("chat")
This will create another folder (tests
) and file (test-chat.R
).
test-chat.R
test_that("chat(\"2+2?\") make sure it returns 4", { expect_equal(chat("what is 2+2? return answer only") |> as.numeric(), 4) })
We then enter something like the above to make sure it returns the answer we want to make sure it works. And it will look something like this after.
test()
🙌🤘🤩 we did it!!!
This is a great image of the actual workflow from the book.
Acknowledgement
I am grateful of the book and also the style of putting the simple example right up front to stimulate curiosity and having the end in mind. That makes learning very inspirational. I also want to thank
Alec Wong
for guiding me to this book that pushed me to learn the workflow of creating an R package. Now I have some idea on how this works, it makes contributing to existing R packages a step closer!!! woo hoo!!! Not that I have much to contribute 🤣
Lessons Learnt
- Learnt R package workflow/best practice (highly recommend a read)
- Learnt
Sys.getenv
, and how to write it to R environment - Learnt some basics of
cli
package - Learnt
\dontrun{}
on roxygen2 example so that it doesn’t execute the function. Before this myset_api_key()
example kept setting OPENAI_KEY toyour-api-key-here
lol - Learnt
grepl
, been always used tostr_detect
now at least we can use ReGex with base
If you like this article:
- please feel free to send me a comment or visit my other blogs
- please feel free to follow me on BlueSky, twitter, GitHub or Mastodon
- if you would like collaborate please feel free to contact me
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.