Use an LLM to translate help documentation on-the-fly

[This article was first published on Getting Genetics Done, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Reposted from Paired Ends at https://blog.stephenturner.us/p/llm-translate-documentation.

The lang package overrides the ? and help() functions in your R session. The translated help page will appear in the help pane in RStudio or Positron. It can also translate your Roxygen documentation.

Using LLMs in R

Most of the developer tooling for AI/LLM training and evaluation is Python-centric, but just over the past few months we’ve seen a surge of new tooling for AI/LLM applications for the R ecosystem.

The lang package

The lang package (sourcedocumentation) is an interesting new addition to the mlverse in R. From the documentation:

lang overrides the ? and help() functions in your R session. If you are using RStudio or Positron, the translated help page will appear in the usual help pane.

If you are a package developer, lang helps you translate your documentation, and to include it as part of your package. lang will use the same ? override to display your translated help documents.

Let’s look at an example. I recently invited my colleague and co-author VP Nagraj to write about the rplanes package we published and released on CRAN for plausibility analysis in epidemiological forecasting.

One of the first functions you might use from this package is read_forecast(), which reads a probabilistic quantile forecast CSV file for downstream plausibility analysis. Let’s look at the help for this function.

library(rplanes)
?read_forecast

En Español

Now let’s get help in Spanish.1 load the lang package and tell it that we’re using llama3.2.2 We’ll set the system language to Spanish, then ask for help again.

Sys.setenv(LANGUAGE="spanish")
?read_forecast

My fluency in Spanish is limited to general conversation and travel needs so I can’t easily verify the accuracy of the translation of this technical language, but when I ran some of this back through Google Translate it seemed to be mostly faithful. Notice how things that shouldn’t be translated aren’t — function names, arguments, columns in the returned output, code in the examples.

हिंदी में … … باللغة العربية

What about non-Western languages?

Let’s try Hindi!

Sys.setenv(LANGUAGE="hindi")
?read_forecast

I can’t verify the accuracy of this translation beyond running some of the text back through Google Translate, but in doing so at first glance the translation isn’t bad.

What about Arabic?

Sys.setenv(LANGUAGE="arabic")
?read_forecast

If you’re a native speaker of any of these, I’d love to know what you think. Chat with me on Bluesky (@stephenturner.us).

Translating your package’s Roxygen docs

The lang documentation has a great section on using lang as a package developer. You can translate all of your Roxygen documentation into the desired language, then edit those translations by hand as needed. Then a special helper function re-roxygenizes your docs placing them in a special inst/man-lang folder. The lang docs explain how this all works, but once you do this, when a user has the lang package loaded, they’ll get your pre-computed and optionally edited translations instead of having to wait around for the LLM to translate the help.

Demo

Here’s a demo using a very small package I wrote for something completely different. Don’t worry about all the Docker stuff described here. There’s one single function, missyelliot(), that simply reverse complements a DNA sequence (“take that flip it and reverse it”). That is, it’ll convert GATTACA to it’s reverse complement TGTAATC.

Restart your R environment, and install the package using devtools/remotes. Load both rpdd and lang.

devtools::install_github("stephenturner/rpdd")
library(rpdd)
library(lang)

Now get some help for missyelliot(). If your language environment variable is English, you’ll get the English help.

Now, change your system language to Spanish, and try it again. Notice how the translated help is instantaneous — you’re relying on the pre-translated and possibly hand-edited translations that come with the package rather than asking an LLM to translate the help for you on the fly.

Sys.setenv(LANGUAGE = "spanish")
?missyelliot

If your language is set to something without a pre-populated translation, you’ll have to register a model through Ollama and translate in real time.

Sys.setenv(LANGUAGE = "russian")
?missyelliot
I think this might be one of the most impactful applications of LLMs inside a developer environment since the rise and rapid adoption of Copilots. The ability to instantly access documentation in multiple languages through lang represents a significant step forward in making data science more accessible and inclusive for the global R community, breaking down language barriers that have historically made it challenging for non-English speakers to fully engage with R’s rich ecosystem of tools and packages. 




To leave a comment for the author, please follow the link and comment on their blog: Getting Genetics Done.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)