Publishing reproducible logbooks

Scenario

Jane has written a (math) paper based on her experiments. She would like anyone to be able to reproduce, check, and improve her calculations.

a binder logbook screenshot

Suggestion of solution

picture of the suggested solution

  1. She describes the experiments as Jupyter notebooks, mixing prose, code, visualization, together with resources: source code, data, media (think of them as logbooks);

  2. She publishes them on a publicly hosted repository (e.g. on GitHub, …);

  3. She makes that repository Binder-ready by describing the software required to run the notebooks; for details, see the Binder documentation, and check the configuration of the examples below.

Some instances

Discussion

By publishing the log books and resources on a publicly hosted repository, Jane also guarantees their long term archival thanks to the Software Heritage project.

The proposed solution takes care of many of the basic hurdles for reproducibility, especially if following the recommended best practices (like pinpointing the versions of the dependencies). Full reproducibility however is intrinsically hard and many aspects are not tackled, like numerical instability, long term availability of software or long term backward compatibility of software and hardware. Also, only relatively lightweight calculations are covered. Nevertheless this hopefully covers the 20% of Pareto’s principle.

To do

Time and expertise required

Assuming Jane is familiar with version control and Jupyter (basic lab skills taught at Software Carpentry, that the experiments were prepared as notebooks, and the software required is packaged (conda, debian, docker container, …), the publishing part could take two hours the first time, and half an hour later on.

What’s new since OpenDreamKit started

OpenDreamKit contribution

Jupyter Front Page Open Science and Reproducibility Binder Use Cases Reproducibility Best Practice

<