Toward versatile JupyterHub deployments, with the Binder and JupyterHub convergence

About this document

Nowadays, many institutions run a JupyterHub server, providing their members with easy access to Jupyter-based virtual environments (a.k.a. notebook servers), preinstalled with a stack of computational software, tailored to the typical needs of the institution’s members. Meanwhile, since a few years ago, Binder lets any user on the internet define, run, and share temporary virtual environments equipped with an arbitrary software stack (examples).

In Fall 2017, Binder was revamped as BinderHub, a lightweight layer on top of JupyterHub. The next step in this convergence is to bring together the best of both worlds: think persistent authenticated Binder; or repo2docker enabled JupyterHub. For now, let’s call them versatile JupyterHub deployments.

This document brainstorms this convergence process: it sets up the ground with a scenario and assumptions for a typical institution-wide JupyterHub deployment, proposes specifications from a user perspective, and describes some typical use cases that would be enabled by such specifications. It further discusses security aspects and what remains to be implemented, before concluding with more advanced features and open questions.

This document started as a collection of private notes reflecting on in-development JupyterHub deployment at Paris-Saclay and EGI respectively, with some additional contributions. They were largely informed by many discussions at March 2018’s JupyterHub coding sprint in Orsay that involved dev-ops of those deployments and two of the main JupyterHub and BinderHub devs: Min Ragan Kelley and Chris Holdgraf. It was also inspired by some of cocalc features. Silly ideas reflecting here are mine, hard work is theirs; thank you all!!!

This document is meant for brainstorming; please hop in and edit.

Typical scenario

An institution – typically a university, a national lab, a transnational research infrastructure such as the European XFEL, or transational infrastructure provider like EGI – wishes to provide its members and users with a Jupyter service.

The service lets user spawn and access personal or collaborative virtual environments: namely a web interface to a light weight virtual machine, in which they can use Jupyter notebooks, run calculations, etc. In the remainder of this document we will use JupyterHub’s terminology and call such virtual environments notebook servers.

To cater for a large variety of use cases in teaching and research, the main aim of the upcoming specifications is to make the service as versatile as possible. In particular, it should empower the users to customize the service (available software stack, storage setup, …), without a need for administrator intervention.

Assumptions

The institution has access to:

Specifications / User Story

Main page of the service

After authentication, the user faces a page that is similar to binder’s main page:

The form consists of:

Behavior upon clicking Launch:

Behavior upon following a server URL/badge:

Some use cases

Local binder (better name? [Binder@home?])

Scenarios:

Setup:

They recreate the same environment on their local server (for example by just changing the server name in the binder URL).

More advanced scenario to explore: Lucy would like to use her Desktop PC because that resource is readily available and idles 99% of the time.

Easy sharing of computational environments

Scenarios:

Setup:

They prepare a notebook server with the appropriate software stack installed configured and access to the user’s shared home directory. Maybe they provide some document. They then just have to share the URL with their colleagues. No lengthy software installation. The colleagues can then start working right away, in their own environment, using their own data, saving their work in their home directory.

In all cases, the explicit description of the computing environment (and the use of open source software!) eases:

Collaboration

Scenario: Alice and Bob want to collaborate on some data analysis.

Setup:

They create a shared volume. Then either:

At this stage, they should not edit the same notebook simultaneously. However the stable version of JupyterLab, due sometime in 2018, should enable real-time collaboration in both setups, thanks to a CRDT file format for notebooks.

Class management

Scenario: using the server for a class’ computer labs and assignments

Desired features:

Prerequisites:

Procedure for the teacher(s):

Fetching the class material:

Submitting assignments:

To explore: integration with local e-learning platforms like Moodle, typically using LTI, in particular for class management and grades reporting. There already exists an LTI authenticator for Jupyter.

Security concerns

A malicious image description, image, or collaborator can:

Implementation status

Most of the features are already there. Here are some of the missing steps:

Alternatives

It should be noted that there is basically no coupling between JupyterHub/Binder and Jupyter. The former is merely a general purpose web service for provisioning web-based virtual environments. For example, JupyterHub/Binder has also been used to serve R-Studio based virtual environments. Reciprocally, there are alternative services to do such provisioning from which to get inspiration, like Simulagora.

Advanced features & open questions

Redirection

The main form could contain an additional item:

Use cases:

Marketplace of images

With the URL mechanism, any (power) user can prepare a dedicated image and share it with his collaborators. Images can be more or less specific: from just a computing environment to a fully specified machine, with mount points, …

Thanks to the above there is no need for a tightly coupled Marketplace. Nevertheless it may be useful to have one location (or more) for collecting and publicizing links to popular images. Some minimal coupling may be needed if one would want to sort the images according to their popularity.

Note: at this stage, a user cannot produce an image by setting up a machine “by hand” and save its state. The construction must be fully scripted. On the plus side, this encourages users to script their images, making them more reproducible.

National and international initiatives such as the European Open Science Cloud may help providing such a catalog of relevant Jupyter notebooks/images.

Default volume configuration

Intensive use and resource management / accounting

The above has been written with casual use in mind. For extensive use, some form of accounting and controlling of the resources used would be needed. For example, for LAL’s cloud we may want to have some form of bridge between the OpenStack dashboard and the hub. UI to be designed. Could the user provision a machine using the dashboard, and then specify on JupyterHub that the container shall be run on that machine?

Blog Front Page Jupyter Use Cases Teaching Emerging Technologies Binder WP4: User Interfaces Cloud Technical Blogpost

<