I recently attended PyCon 2018 in Cleveland, Ohio. Apart from attending many exciting talks, BoF (bird of a feather session) and additional events I gave a tutorial on Reproducible analysis workflows. I have to admit I was a bit surprised when I realised that the 58 places available for the workshop had been booked by attendees to the conference.
The workshop started with a presentation illustrating why reproducibility is important and focusing on the definition of reproducibility, replicability, and robustness. This introductory section was also an excellent opportunity to promote discussions among the attendees and to better understand what the main barriers to reproducibility are they have encountered.
The next part of the workshop was focused around providing the attendees with practical guidelines and tips to store better and manipulate their data, create filesystems, develop analysis packages and test their code. One of the things I really wanted to highlight was the importance of licensing your scientific code. When you have little experience writing and sharing code, this might be a tricky subject to dive into. Over the years I have encountered projects deposited in places like GitHub without a license, which leaves the software in a weird usage limbo. On the one hand, the code is publicly stored in a repository. On the other hand, it does not explicitly say whether others can use, modify or distribute the software in any given way. As a consequence, many people would opt not to use the software as they are not confident they are allowed to.
Once again, with the popularity of Jupyter notebooks, it is essential to facilitate the testing and version control as we’d do with any other pieces of standalone scripts. Thus for these topics, I always go back to using nbdime and nbval. The main reason? These packages bring all of the version control and regression testing capabilities that Jupyter notebooks had been missing for a while. What is more, now with nbdime 1.0 out you not only get better integration with git but also get Jupyter notebook and JupyterLab extensions 🎊🎉. So if you are using JupyterLab, you can now have the HTML rendered diffs served directly in the app without even leaving it. When it comes to Jupyter Notebooks regression testing, we are often interested in verifying whether the results are consistent across different runs of the analysis or if the results and analyses have not been affected by newer releases of specific packages we use. In those cases nbval does the trick for you.
Finally, the workshop covered how you can share your code and make it publicly available and what you can do to get credit for it (generate a DOI and a citation file).
Overall, the workshop was a success, and we will be doing a re-run in Leeds on the 14th of June 2018. For anyone willing to have a look at the materials they can be found in this GitHub repository.
Workshops and Conferences Blog Front Page Jupyter Teaching nbdime Best Practice