How do you explain reproducible research to clients?
[This article was first published on Social data blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Most of the statistics work I do now is reproducible research – this
can offer a big advantage for clients but of course that doesn’t
necessarily mean they realise it …
documents (and which therefore appears in the pdf’s) to explain
reproducible research. Would be very interested if anyone has any
better ideas …. ——————— This is a reproducible research document. This approach has the following advantages: • making it easier for us
to return to the data and analyses in the future and repeat or extend
them • making it easier for the client to do the same without having
to contact us • enabling other researchers to repeat and verify these
findings themselves, even automatically if they desire. • Ensuring
complete transparency of the results. Concretely, this means that the original SPSS and other data files
will not be changed at all. All recoding, data cleaning, omission of
cases etc is carried out in syntax. In fact this report document
itself – tables, graphics, statistics mentioned within the text are
produced entirely by the following procedure: A word processing
document (“source file”) is prepared which is essentially the final
report complete with introduction, chapter headings, commentary etc
together with blocks of syntax where statistical results are required
– in particular tables, and graphics and inline results. A single
syntax file is run which takes the source file and creates a second
document, the present report, which is identical to the source file
except that the blocks of syntax are replaced by the results of the
syntax (tables, graphics, etc.). So there is neither any
cutting-and-pasting or editing of data in the data files and nor is
there, for example, any manual editing of table data or graphics. So
at each point in this report at which data preparation is discussed,
the interested reader will find the corresponding syntax at the
corresponding point in the source file which actually conducts the
corresponding data preparation. And at each point in this report at
which tables, graphics etc are displayed, the interested reader will
find the syntax at the corresponding point in the source file which
actually constructs those tables and graphics. So the source document
and datasets can be made available to third parties who can then
repeat these calculations, see exactly how they are arrived and, and
can extend the analyses at will.
Unfortunately, to the best of our knowledge the statistics program
most familiar to social scientists, SPSS, does not fulfill all of
these requirements, in particular it cannot produce a complete report
automatically. So the work will be carried out using the package
Sweave for the open-source statistics program R1. But intermediate
datasets in SPSS format including all recoded and calculated variables
can be provided additionally, so that as much as possible of the above
can also be accomplished with SPSS.
In detail, the original word processing file is written using the free
programs libreoffice (www.documentfoundation.org) or Lyx (www.lyx.org)
which are available for Windows, Mac and Linux, which is transformed
into the present pdf report – the document you are looking at now –
using the R statistics engine, www.r-project.org, also available free
on all platforms.
To leave a comment for the author, please follow the link and comment on their blog: Social data blog.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.