Medium-scale JupyterHub deployments
JupyterHub is the new cool kid on the block. Everyone knows Jupyter notebooks and how much they have revolutionized the workflows of scientists and students alike. Whereas Jupyter is meant to run on a personal computer, JupyterHub is the solution that brings Jupyter to your own cloud, be it your team’s, your university’s or your company’s cloud.
Not all clouds are born equal, though. The official JupyterHub documentation targets very large clouds with multiple nodes, managed through Kubernetes.1 This kind of deployment is adapted for very large structures, such as universities, large companies etc.
“The littlest JupyterHub” adheres to a different philosophy: a JupyterHub deployment on a single server, with no virtualization or containerization technology. This kind of deployment is perfect for your small team, where one administrator can manually create accounts, and users can share data.
So far, no documentation seems to have addressed the medium-scale deployment, e.g., the one adapted for your university department, or for your company with hundreds of employees. The goal of this tutorial is to present a complete solution to deploy a JupyterHub server with delegated authentication and containerized environments, based on Docker. If you have a powerful server lying underused in your organization’s racks, chances are you will find this deployment to your liking.
The set-up described in this tutorial has been deployed at University of Versailles for the whole faculty of science. It is used both for teaching and research. You can find the full configuration scripts here: https://github.com/defeo/jupyterhub-docker/. I will be glad to receive your comments, suggestions, bug reports and help requests via the GitHub Issues section.
The architecture of JupyterHub
A JupyterHub deployment has several components. Processes first:
-
A configurable Proxy, responsible for receiving all web requests and dispatching them to the appropriate component.
-
A Hub, responsible for handling authentication, spawning single-user Jupyter notebook servers, and configuring the Proxy.
-
Single-user Jupyter servers. These are nothing else than the Jupyter notebook servers you are used to run on your personal computer, however they are started and stopped on demand by the Hub as users come and go.
More details can be found on the official JupyterHub documentation. In our deployment, we will put the Proxy and the Hub in the same container, and the single-user server in another one.
Although JupyterHub can be directly exposed to the web without any intermediary, and even handle TLS certificates for https, it will be convenient for us to put it behind a reverse proxy, for efficiency, security, and other reasons that will be apparent later. Typical choices for a reverse proxy are Apache 2 and Nginx; however in this tutorial we chose to use Traefik, for its ease of use and nice integration with Docker.
Thus, we will in total have to handle three container (images): one for the Proxy+Hub, one for the reverse proxy, and one for the single-use servers. To simplify the management, and automatizing the deployment, we will use Docker Compose. All the configuration will be kept in a single folder, with the following structure:
./
+-- .env
+-- docker-compose.yml
+-- jupyterhub/
| +-- Dockerfile
| +-- jupyter-config.py
+-- jupyterlab/
| +-- Dockerfile
+-- reverse-proxy/
+-- traefik.toml
The main configuration file for Docker Compose is
docker-compose.yml
: it configures all containers (services, in
Compose jargon), and associated volumes and networks. It will look
like this:
version: '3'
services:
# Configuration for Hub+Proxy
jupyterhub:
...
# Configuration for reverse proxy
reverse-proxy:
...
# Configuration for the single-user servers
jupyterlab:
...
volumes:
...
networks:
...
We now explain the configuration of each service, and the contents of each additional file in detail.
The reverse proxy
This is the simplest one, since most of the configuration is pushed
into the other services. Traefik uses a single file for configuration,
named traefik.toml
. The recommended
way to
use it is to bind-mount it inside the default docker container. Using
docker-compose.yml
:
reverse-proxy:
image: traefik
ports:
- "80:80"
- "443:443"
- "8080:8080"
volumes:
- ./reverse-proxy/traefik.toml:/etc/traefik/traefik.toml
- /var/run/docker.sock:/var/run/docker.sock
- /etc/certs:/etc/certs
Notice that we also bound some internet ports: ports 80 and 443 are for http and https, while port 8080 is for the Traefik dashboard, useful for analytics and debugging. You probably want to block access to the 8080 port from outside your local network (e.g., via iptables or some other firewall technology).
We also mounted two other volumes in the container. The first one,
docker.sock
, is to let Traefik access the Docker server, this will
let it automagically configure routing web requests to other service
as they are started by Docker.
The second one is to let Traefik access your server TLS credentials,
for https. This assumes that you have put in /etc/certs
on your host
machine a server.crt
and a server.key
file, containing your server
certificate and private key.
Now we are ready to write the traefik.toml
configuration file. We
want to:
- redirect all http requests to https, configure TLS certificates,
- activate automatic discovery of running Docker containers,
- activate the Traefik dashboard.
debug = false
logLevel = "ERROR"
defaultEntryPoints = ["https","http"]
# Redirect HTTP -> HTTPS, install certificates
[entryPoints]
[entryPoints.http]
address = ":80"
[entryPoints.http.redirect]
entryPoint = "https"
[entryPoints.https]
address = ":443"
[entryPoints.https.tls]
[[entryPoints.https.tls.certificates]]
certFile = "/etc/certs/server.crt"
keyFile = "/etc/certs/server.key"
# Activate docker API
[docker]
domain = "docker.local"
watch = true
# Activate Traefik dashboard
[api]
This is all that’s needed. You may be wondering why traefik.toml
contains no configuration specific to JupyterHub. This is because the
JupyterHub components will communicate with Traefik via the Docker
api.
The Hub
The Hub is the centerpiece of our setup. When a user first lands on our JupyterHub instance, it is directed to the Hub, which then:
- Authenticates the user by any of the configured methods (see below);
- Spawns a single-user Jupyter notebook server for the user (in our case, this will run in a Docker container);
- Redirects the user to the single-user server.
We will start from the official JupyterHub
container, and build
our own Hub on top of it. We’ll put all the configuration files in
the folder named jupyterhub
, and we’ll tell Compose to build and
configure the container:
jupyterhub:
build: jupyterhub # Build the container from this folder.
container_name: jupyterhub_hub # The service will use this container name.
volumes: # Give access to Docker socket.
- /var/run/docker.sock:/var/run/docker.sock
environment: # Env variables passed to the Hub process.
DOCKER_JUPYTER_IMAGE: jupyterlab_img
DOCKER_NETWORK_NAME: ${COMPOSE_PROJECT_NAME}_default
HUB_IP: jupyterhub_hub
labels: # Traefik configuration.
- "traefik.enable=true"
- "traefik.frontend.rule=Host:jupyter.example.com"
Here’s some highlights from the Compose configuration:
-
We mount the Docker socket inside the container, so that it will be able to spawn containers for the single-user Jupyter servers.
-
We set some environment variables for the Hub process, they will be used in the Hub configuration file
jupyterhub_config.py
:-
DOCKER_JUPYTER_IMAGE
is the name of the Docker image for the single-user servers; this must match the image configured in thejupyterlab
section ofdocker-compose.yml
(see below). -
DOCKER_NETWORK_NAME
is the name of the Docker network used by the services; normally, this network gets an automatic name from Docker Compose, but the Hub needs to know this name to connect the single-user servers to it. To control the network name we use a little hack: we pass an environment variableCOMPOSE_PROJECT_NAME
to Docker Compose, and the network name is obtained by appending_default
to it.ATTENTION! Experience shows that this step is often missed or misunderstood. A value (any name you like, really) for the environment variable
COMPOSE_PROJECT_NAME
must be passed to Docker Compose. You can do this either via the shell, or by adding a.env
file next todocker-compose.yml
, with the following content# Name of our Docker Compose project COMPOSE_PROJECT_NAME=my_hub
The
.env
file solution is the choice we made in the example repo. -
HUB_IP
is the IP address of the Hub service within the docker network. By using thecontainer_name
Compose directive, we can set an alias for the IP, and use the same forHUB_IP
.
-
-
We set labels for configuring Traefik. These have no effect on Docker, but are used by Traefik for configuring routing. In this example, we are activating the Traefik API, and we are asking to redirect all requests for
jupyter.example.com
to the Hub service.
We can now move to the configuration of the container itself. We start
with the Dockerfile
:
# Do not forget to pin down the version
FROM jupyterhub/jupyterhub:0.9.3
# Copy the JupyterHub configuration in the container
COPY jupyterhub_config.py .
# Download script to automatically stop idle single-user servers
RUN wget https://raw.githubusercontent.com/jupyterhub/jupyterhub/0.9.3/examples/cull-idle/cull_idle_servers.py
# Install dependencies (for advanced authentication and spawning)
RUN pip install \
dockerspawner==0.10.0 \
oauthenticator==0.8.0
It is important that we explicitly pin down a version of JupyterHub (0.9.3 in the example), indeed the containers for the Hub and for the single user servers must use the same version of JupyterHub.
Next to Dockerfile
, we have jupyterhub_config.py
to configure the
Hub. This plain Python file contains many different sections. We
start with the configuration of the spawner: we use the class
DockerSpawner
to spawn single-user servers in a separate Docker
container. We use here the environment variables that we have set in
docker-compose.yml
:
import os
c.JupyterHub.spawner_class = 'dockerspawner.DockerSpawner'
c.DockerSpawner.image = os.environ['DOCKER_JUPYTER_IMAGE']
c.DockerSpawner.network_name = os.environ['DOCKER_NETWORK_NAME']
c.JupyterHub.hub_ip = os.environ['HUB_IP']
Optionally, we may want to stop the single-user servers after a certain amount of idle time. Following this example, we register a JupyterHub service like this:
c.JupyterHub.services = [
{
'name': 'cull_idle',
'admin': True,
'command': 'python /srv/jupyterhub/cull_idle_servers.py --timeout=3600'.split(),
},
]
For a complete list of all available configuration options, you can
run the following command (answer y
to the question):
docker-compose run --rm jupyterhub jupyterhub --generate-config -f /dev/stdout
Authentication
For the Hub configuration to be complete, we have to configure an authentication method. By default, JupyterHub authenticates users with the local system (more precisely, via PAM), but this is not useful for us.
The quickest way to get a JupyterHub server running with a working
authentication, is to delegate to an authentication service such as
GitLab’s. This requires adding only two lines to
jupyterhub-config.py
:
## Configure authentication (delagated to GitLab)
from oauthenticator.gitlab import GitLabOAuthenticator
c.JupyterHub.authenticator_class = GitLabOAuthenticator
Other preconfigured third party services are documented here. In principle, any OAuth2 server can be used as third party authentication server. If you have such a server in your institution, you can use it like this:
from oauthenticator.oauth2 import OAuthLoginHandler
from oauthenticator.generic import GenericOAuthenticator
from tornado.auth import OAuth2Mixin
# OAuth2 endpoints
class MyOAuthMixin(OAuth2Mixin):
_OAUTH_AUTHORIZE_URL = 'https://oauth2.example.com/login'
_OAUTH_ACCESS_TOKEN_URL = 'https://oauth2.example.com/token'
class MyOAuthLoginHandler(OAuthLoginHandler, MyOAuthMixin):
pass
# Authenticator configuration
class MyOAuthAuthenticator(GenericOAuthenticator):
login_service = 'Example'
login_handler = MyOAuthLoginHandler
userdata_url = 'https://oauth2.example.com/userdata'
token_url = 'https://oauth2.example.com/token'
oauth_callback_url = 'https://jupyter.example.com/hub/oauth_callback'
client_id = '...' # Your client ID and secret, as provided to you
client_secret = '...' # by the OAuth2 service.
c.JupyterHub.authenticator_class = MyOAuthAuthenticator
Alternatively, your institution may have an LDAP server, in which case
you can use the LDAPAuthenticator
with the configuration described
here: modify
Dockerfile
to install jupyterhub-ldapauthenticator
, then add (at
least) these lines to jupyterhub_config.py
:
c.LDAPAuthenticator.server_address = 'ldap.example.com'
c.LDAPAuthenticator.bind_dn_template = []
c.JupyterHub.authenticator_class = 'ldapauthenticator.LDAPAuthenticator'
Another common authentication service in Universities is Apereo CAS, which is supported by Jupyterhub CAS Authenticator. You will find an example configuration here.
The Jupyter notebook servers
The last component of our setup is the single-user Jupyter server. This is the more versatile and configurable part, as it defines the environment where your users are going to work.
The simplest way to get started, is to use one of the pre-packaged
Jupyter Docker
stacks.
This only requires one line in docker-compose.yml
:
jupyterlab:
image: jupyterlab/scipy-notebook:1145fb1198b2
command: echo
The extra line command: echo
is there so that, when Docker Compose
starts the service, it terminates immediately. Indeed this image is
meant to be loaded by the Hub, not by Compose. Do not forget to edit
the jupyterhub
section of docker-compose.yml
, so that the
DOCKER_JUPYTER_IMAGE
env variable matches this one.
jupyterhub:
environment:
DOCKER_JUPYTER_IMAGE: jupyterlab/scipy-notebook:1145fb1198b2
Alternatively, you can use any of OpenDreamKit’s pre-built images.
Finally, you can build your own image: the only requirement is that it
contains the jupyterhub
Python package (and possibly some Jupyter
kernels, to make it interesting). I recommend starting from one of
the Jupyter Docker stacks images: create a file
jupyterlab/Dockerfile
with, e.g., the following contents:
FROM jupyter/scipy-notebook:1145fb1198b2
RUN conda install --quiet --yes \
'r-base=3.4.1' \
'r-irkernel=0.8*'&& \
conda clean -tipsy
then modify docker-compose.yml
as follows:
jupyterlab:
build: jupyterlab
image: jupyterlab_img
command: echo
It is recommended that you test your image alone, before starting JupyterHub. You can do so by building it with Compose:
docker-compose build jupyterlab
then running a test server (adapt the ports to your configuration), with:
docker run --rm -p 8888:8888 jupyterlab_img
and copying in your browser the URL printed on the console (something
like http://127.0.0.1:8888/?token=012...). You can stop the test
server hitting Ctrl-C
.
If everything works normally, you can try out the excellent JupyterLab interface by replacing http://127.0.0.1:8888/user/.../tree with http://127.0.0.1:8888/user/.../lab. You will not be disappointed, JupyterLab is a game changer: it transforms the old notebook interface in a full featured IDE, with multiple split views, file explorer, Jupyter notebooks, and much more, in a single browser window.
If you want to user JupyterLab as the default interface for your
JupyterHub users, configure the Hub by adding this line to
jupyterhub-config.py
:
# Redirect to JupyterLab, instead of the plain Jupyter notebook
c.Spawner.default_url = '/lab'
Running!
You are now ready to test your JupyterHub server. As superuser, or as
a user of the docker
group, run
docker-compose build
docker-compose up
The JupyterHub server loads up and, if all is configured properly,
starts waiting for connections. Test that authentication and starting
up single-user servers works properly, try playing with some
notebooks. When you are satisfied you can stop the server by hitting
Ctrl-C
.
Be sure to read the documentation to Docker compose, and JupyterHub before going into production. Customize the setup to your liking, then, when you are ready to put the server online for good, launch it with
docker-compose up -d
You can stop and destroy the server with
docker-compose down
Note that single-user servers are not destroyed when they are
terminated: when a user returns, JupyterHub will look for a Docker
container named jupyter-
username
, and will restart it with all
its data. This means that, even if you update the JupyterLab image,
returning users will not see the changes until you destroy their old
containers. You can list all containers, including inactive ones, with
docker ps -a
and you can remove them individually with docker rm
. To remove all
containers at once (after the JupyterHub server has been stopped), you
can run a command like this:
docker rm $(docker ps -qa -f "name=jupyter-")
Be wary that this command destroys all data associated with the users, including their work. Use with care!
Data persistence
With this configuration, our JupyterHub server is not persistent: all user data is irremediably lost when the containers are destroyed. This may be acceptable for some use cases, however it is not what is typically wanted.
Using Docker volumes, we can make storage permanent. Two kinds of data need to be stored: the Hub data, containing information on administrators and existing users, and the user’s data for the single-user servers.
To persist the Hub data, simply modify docker-compose.yml
by adding
a volume and mounting it inside the jupyterhub
service:
services:
jupyterhub:
volumes:
- jupyterhub_data:/srv/jupyterhub
volumes:
jupyterhub_data:
Be careful with this configuration: if you update
jupyterhub_config.py
and rebuild the jupyterhub
service, the
contents will not be updated inside the volume, unless you destroy it
first with docker volume rm
.
The Hub takes care of handling volumes for the single-user servers.
We can configure data persistence by adding these lines to
jupyterhub-config.py
:
import os
# user data persistence
# see https://github.com/jupyterhub/dockerspawner#data-persistence-and-dockerspawner
notebook_dir = os.environ.get('DOCKER_NOTEBOOK_DIR') or '/home/jovyan/work'
c.DockerSpawner.notebook_dir = notebook_dir
c.DockerSpawner.volumes = { 'jupyterhub-user-{username}': notebook_dir }
This way, even if we update the jupyterlab
image and destroy the
associated containers, the user data will not be lost.
Note that the level of control on the user data is rather minimal in this configuration. For example it is not possible to enforce per-user disk quotas. It would be interesting to explore more advanced uses of Docker volumes enabling complex usage patterns.
Block networking
In contexts where you do not trust your users, it may be useful to restrict networking from inside single-user containers. Indeed, single-user servers are full fledged Unix servers, and can potentially be used as relay nodes in network attacks.
A simple way to restrict networking is to ban any connection to the
outside world from single-user servers, letting them only communicate
with the Hub. This is easily accomplished using the internal
flag
for Docker networks. Edit docker-compose.yml
like this:
services:
jupyterhub:
environment:
DOCKER_NETWORK_NAME: ${COMPOSE_PROJECT_NAME}_jupyter
HUB_IP: jupyterhub_hub
networks:
default:
jupyter:
aliases:
- jupyterhub_hub
labels:
- "traefik.docker.network=${COMPOSE_PROJECT_NAME}_default"
networks:
jupyter:
internal: true
More complex filtering can be achieved with iptables.
Conclusion
We have described a full setup for a JupyterHub + Jupyter/JupyterLab server targeted to a medium sized institution, with external authentication and containerized components.
This setup can reasonably run on a personal computer, but is really ideal for powerful servers, such as those found in research labs at many universities.
Thanks to Docker compose, the whole configuration fits in a few hundred lines, and is very easy to maintain and replicate.
We have been running an experiment for one year already, using a similar setup to teach a class in University of Versailles. Since September 2018 we are now running a larger scale deployment, used in several classes. You can find the complete current configuration at https://github.com/defeo/jupyterhub-docker.
For comments, questions and suggestions, feel free to contact me directly.
-
See also https://blog.jupyter.org/how-to-deploy-jupyterhub-with-kubernetes-on-openstack-f8f6120d4b1 for a Kubernetes deployment of JupyterHub on OpenStack. ↩
Blog Front Page Jupyter Teaching Emerging Technologies Cloud Technical Blogpost