A Draft of ProjectTemplate v0.2-1

John Myles White

12 years ago

[This article was first published on John Myles White » Statistics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve just uploaded a new binary of ProjectTemplate to GitHub. This is a draft version of the next release, v0.2-1, which includes some fairly substantial changes and is backwards incompatible in several ways with previous versions of ProjectTemplate.

Foremost of the changes is that most of the logic for load.project() is now built into the load.project() function directly, rather than spread out into autogenerated scripts that you can edit by hand. While this makes ProjectTemplate harder for non-experts to modify, the change will make it much easier to make revisions to ProjectTemplate in the future without having to worry about existing projects falling behind because of vestigial code that’s not being automatically updated when you install a new version of ProjectTemplate.

Because more system logic is now hardcoded into functions, each project’s configuration is handled through a YAML file in config/global.yaml. Incidentally, this introduces the new directory, config/, where configuration files will go from now on.

The data loading system is also more complex than it was before. First, there’s a new hierarchy of data sources: now the system will look for data in a cache/ directory before moving on to the data/ directory. This makes it possible for you to permanently store changes to your data set in cache/ that will allow you to skip loading the raw data set. This is helpful when the original data set is enormous and you only need a radically reduced form of it for your future analyses that you’ll store in cache/.

In addition, preprocessing is now handled through a series of ordered scripts in a munge/ directory rather than just a single preprocessing script in the lib/ directory. There’s also a log/ directory, used by the new integrated log4r support, which is off by default, but can be easily set up after installing log4r from CRAN.

Finally, there’s a src/ directory where we’re going to encourage users to place their primary analyses, so that the main directory always has the same files and directories across all projects.

In addition to all of these changes, many of which were inspired by conversations with Mike Dewar, I’ve incorporated some very helpful patches in this release. Specifically, Diego Valle-Jones fixed a bug in clean.variable.name() that lead to trouble when filenames in the data/ directory began with numbers and Patrick D. Schalk contributed code that adds support for SQLite to ProjectTemplate along with general improvements to the database access codebase.

Thanks for all of the support since the last release. Please let me know if there any changes that need to be made before I turn v0.2-1 loose on CRAN.

To leave a comment for the author, please follow the link and comment on their blog: John Myles White » Statistics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.