Research Software in Academic Hiring and Promotion: A proposal for how to assess it

nicebread

6 hours ago

[This article was first published on R – nicebread.de, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In 2021, the German Psychological Society (DGPs) signed the DORA declaration. In consequence, they recently installed a task force with the goal to create a recommendation how a responsible research assessment could be practically implemented in hiring and promotion within the field of psychology.

In our current draft (not public yet) we want to decenter (A) scientific publications as the primary research output that counts, and recommend to also take (B) published data sets, and the development and maintenance of (C) research software into consideration. (Along with Recognition and Rewards and other initiatives, we also call for taking Teaching, Leadership skills, Service to the institution/field, and Societal impact into account. In the white paper, however, we only address the operationalization of the Research dimension).

Concerning research software, we worked on an operationalization. This is inspired from:

the INRIA Evaluation Committee Criteria for Software Self-Assessment
Alliez, P., Cosmo, R. D., Guedj, B., Girault, A., Hacid, M.-S., Legrand, A., & Rougier, N. (2020). Attributing and Referencing (Research) Software: Best Practices and Outlook From Inria. Computing in Science & Engineering, 22(1), 39–52. https://doi.org/10.1109/MCSE.2019.2949413
Gomez-Diaz T and Recio T. On the evaluation of research software: the CDUR procedure [version 2; peer review: 2 approved]. /F1000Research/ 2019, 8:1353 (https://doi.org/10.12688/f1000research.19994.2)

Please note that …

The system should be as easy as possible (otherwise it will not be used in hiring committees)
Psychologists are not computer scientists, so existing criteria for those might be too advanced.
As R is the #1 open source software for statistical computing in psychology, so all examples relate to R.

Here is our current draft of the research software section. As we are not aware of any concrete implementation of assessing research software for hiring or promotion purposes (at least not in psychology or neighboring fields), we like to ask the community for feedback. At the end of the post we list three ways how you can comment.

DRAFT SECTION FOR OPERATIONALIZING RESEARCH SOFTWARE CONTRIBUTIONS IN HIRING AND PROMOTION

(C) Research Software Contributions

Research software is a vital part of modern data-driven science that fuels both data collection (e.g., PsychoPy, Peirce et al., 2019, or lab.js, Henninger et al., 2021) and analysis (see, for example, R and the many contributed packages). In some cases, the functioning of entire scientific disciplines depends on the work of a few (often unpaid) software maintainers of critical software (Muna et al., 2016). Furthermore, non-commercial open source software is a necessary building block for computational transparency, reproducibility, and a thriving and inclusive scientific community. Instead of being a “career suicide”, it is high time that research software development is properly acknowledged in hiring and promotion.

Some research software is accompanied with a citable paper describing the software (e.g., for the lavaan structural equation modeling package in R: Rosseel, 2012). However, these “one-shot” descriptions of software often do not appropriately reflect the continuous work and changing teams that are necessary to develop and maintain research software. Therefore we include “Contributions to Research Software” as a separate category with their own quality criteria. Note that this category (C) only refers to dedicated, reusable research software, not to specific analysis scripts for a particular project. The latter should be listed under “Open reproducible scripts” of the respective paper in section (A).

For the evaluation of contributed research software, applicants can list up to 5 software artifacts along with the self-assessment criteria presented in Table 3 (a more comprehensive evaluation scheme with more quality criteria is proposed in Appendix A). Contributor roles are taken from the INRIA Evaluation Committee Criteria for Software Self-Assessment.

Table 3. Simple evaluation scheme for research software, with one specific example

	Research Software 1	URL	Comment
Title	R package RSA	https://CRAN.R-project.org/package=RSA
Citation	Schönbrodt, F. D. & Humberg, S. (2021). RSA: An R package for response surface analysis (version 0.10.4). Retrieved from https://cran.r-project.org/package=RSA
Short description	An R package for Response Surface Analysis
Date of first full release	2013		Necessary to compute citations relative to age of software
Date of most recent major release	2020		Indicates whether software is actively maintained
Contributor roles and involvement	DA-3 CD-3 MS-3		What has the applicant contributed? For each of the 3 roles: – design and architecture (DA) – coding and debugging (CD) – maintenance and support (MS) … specify if you are: 0. not involved 1. an occasional contributor 2. a regular contributor 3. a main contributor Example: DA-2, CD-3, MS-1
License	GPLv3		Is the software open source?
Scientific impact indicators:
Downloads or users per month	710 downloads / month	https://cranlogs.r-pkg.org/badges/RSA
Citations	110	https://scholar.google.de/citations?view_op=view_citation&hl=de&user=KMy_6VIAAAAJ&citation_for_view=KMy_6VIAAAAJ:mB3voiENLucC	Evaluate relative to the age of software
Other impact indicators (optional)	–		E.g., Github stars, number of dependencies. Be careful and responsible when using metrics, in particular when they are black-box algorithms.
Reusability indicator	R3		Levels of the reusability indicator: R1 (0.25 points): Single scripts, loose documentation, no long-term maintenance. Prototype: A collection of reusable R scripts on OSF. R2 (1 points): Well-developed and tested software, fairly extensive documentation. Some attention to usability and user feedback. Not necessarily regularly updated. Prototype: A small CRAN package with no more active development (just maintenance) R3 (2 points): Major software project, strong attention to functionality and usability, extensive documentation, systematic bug chasing and unit testing, external quality control (e.g. by uploading to CRAN). Regularly updated. Prototype: Well received and actively maintained CRAN package. R4 (6 points): Critical infrastructure software. Hundreds of research projects use or depend on the software (+ all criteria of R3). Prototype: lavaan package.
Merit / impact statement (narrative, max 100 words)	The RSA package has become a standard package for computing and visualizing response surface analyses in psychology. A PsycInfo search for “response surface analysis” (from 2022-05-18) revealed that of the 20 most recent publications, 35% used our package (although 2 of 7 did not cite it). Several features, such as computation of multiple standard models and model comparisons are unique to this package.
Reward Points	(3+3+3)/3 * 3 = 9		Take the average value of the 3 contributor roles and multiply it with the points of the level of the reusability indicator.

Is there essential information missing in the table?

Calibrating reward points

We also want to offer a suggestion how to compute „reward points“. The goal is to bring the categories of „publications“ and „software contributions“ onto a common evaluative dimension. This gets a bit complicated, as we also propose bonus points for publications with certain quality criteria, so not every publication gets the same number of points. For the moment, imagine a publication of good quality (neither a quickly churned out low-quality publication, nor an outstanding, seminal contribution). What is the “paper equivalent” of a software contribution? Note that these bonus points are thought as incremental to an existing paper that describes the software.

Here’s our suggestion, being aware that it is easy to find counter-examples that do not fit in the system. But we are happy if our system is an incremental improvement over the status quo (which is: to ignore software contributions and to count the number of papers without any quality weighting):

Research Software Prototype	Paper equivalents (of good quality)
Simple script (a few hundred lines) with reuse potential, completely done by applicant	0.25
A well-developed CRAN package: Occasional co-developer with a minor contribution	0.5
A well-developed CRAN package: Active co-developer with major contribution	1
A well-developed CRAN package: Main developer	2
Critical infrastructure: Regular co-developer	2
Critical infrastructure (e.g., lavaan): Main developer	5

How to comment?

If you have comments, you can …

post them here below the blog post
write an email to felix.schoenbrodt@psy.lmu.de
directly add your comments in a Google doc

Thanks for your help!

To leave a comment for the author, please follow the link and comment on their blog: R – nicebread.de.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.