Prototyping to tested code, or developing across notebooks and modules

Two weeks ago, I was lucky enough to attend the PyCon.DE 2018 in Karsruhe. I also got the opportunity to present my ideas on the role of notebooks and modules in Python development, with a particular focus on testing. Since the slides and video are available online, I do not want to rehash the talk here, but rather spend the bulk of this post to explain my setup in more detail.

The motivation for my talk was the observation that while notebooks are often placed in contrast to "production ready" code, I find them extremely useful in developing said "production ready" code. Another motivation was the recent JupyterCon talk by Joel Grus, titled "I don't like notebooks" (video), that I can highly recommend (*).

(*) A small aside. After my talk, I stumbled into a discussion, whether Joel's criticism is fair or whether he is criticizing notebooks for misuse by their users. Personally, I feel that his presentation may be a tad polemic. At the same time I ran (and still run) into most issues he raises myself. To use notebooks effectively, requires a lot of discipline. And I feel this fact is not highlighted enough. So in sum, I feel that, yes, he mostly describes usage issues and sometimes compares unfairly, but I also feel that there is a need for better communication of these issues and better tooling to address them. But to get back to the main story:

In my eyes, the stark contrast drawn between notebooks and modules is a false dichotomy. Both have their respective strengths and weaknesses with a combination of both of them being very effective. For me a work split between notebooks and modules is emerging that roughly looks as follows:

  1. Start prototyping in notebooks
  2. Refactor mature code into a local package
  3. Keep analyses and high-level docs in notebooks

This work split hinges however on a certain setup and libraries that behave sufficiently the same between environments. The latter issue was the motivation to create ipytest. It allows to use pytest in the same way across notebooks and modules and thereby makes it easy to move tests from one environment to the other. For a detailed introduction to ipytest, please have a look at my talk (slides / recording) or the ipytest homepage. In the following I will concentrate on the environment setup and explain it in detail. An example of how it will look like, can be found here.

Before I start, I would like to highlight a very similar blog post by Florian Wilhlem, in which he makes different technical decisions with a slightly different focus. As I feel workflows are to some extend a matter of personal preference, reading his post as well gives you more options to find a setup that works for you.

Below, I discuss the following parts:

  1. Virtual environment
  2. Git repository
  3. Local package
  4. Notebooks and modules
  5. Tests
  6. Automation of common tasks
  7. Reproducing the setup

1. Setting up the virtual environment

Virtual environments (or virtualenvs) have become the de facto standard for python development, as they offer isolation of installed packages. However, managing virtualenvs, installing packages, and keeping everything in sync can be tricky. Recently, I started to use pipenv to setup my environments. It greatly simplifies virtualenv handling and ties together different aspects of the typical python workflow in one convenient interface. While it has some issues, like long runtimes and sometimes suboptimal dependency resolution, I generally really enjoy using it.

You can install pipenv by following these instructions. Afterwards, just install the minimal jupyter components with

pipenv install notebook ipykernel

This command will not only setup the environment, but also create the Pipfile and the Pipfile.lock if they do not already exist. The first file, contains the abstract requirements, i.e., requirements without exact versions. The second file will contain the concrete requirements, i.e., requirements with exact versions. Thereby, the virtualenv is fully described and can easily be re-created for reproduction of the results.

To run the notebook inside the created virtualenv, execute

pipenv run jupyter notebook

When running the kernel and the frontend independently, e.g., when using nteract or running inside an hosted jupyter instance, the kernel has to be made available to the frontend. Below, I describe how. Since pipenv is a quite new tool, I also collected some useful commands.

2. Setting up the git repository

To ensure a reproducible workflow, I highly recommend to setup a git repository for any new project. Using git allows you to go back, when things don't work out, or find pieces of code, that you may have deleted. For a detailed tutorial how to use git, see for example the git book. I also collected some useful commands below.

To ensure full reproducibility, check in both the Pipfile and the Pipfile.lock. This way anybody can re-create the local virtualenv by executing pipenv sync --dev.

Notebooks can be a bit tricky when using git, as they are not easily diffed or merged. The nbdime project aims to alleviate these issues with special commandline and gui tools. It also offers a simple way to integrate itself into your git workflow. I only recently started to use nbdime, so please take the following comments with a grain of salt. nbdime's setup only requires:

pipenv install nbdime
pipenv run nbdime config-git --enable

However, afterwards git will only work inside the current virtualenv, as it references the nbdime scripts inside that environment. As a workaround, I manually edited the .git/config sections that nbdime creates to use the full path. For example, I replaced git-nbdiffdriver with the output of pipenv run which git-nbdiffdriver, and similarly for the other tools.

3. Setting up the local package

Now that the basis environment is setup, let us prepare a local package to collect code moved out of the notebook. For demonstration purposes, I assume all source code will be included in the ipytest_demo package under the src directory:

mkdir src
mkdir src/ipytest_demo
touch src/ipytest_demo/

To make sure the code can be found from python, let's create a package and install it into the local virtual environment. The package is specified by creating a minimal file:

from setuptools import setup, PEP420PackageFinder

    package_dir={'': 'src'},

Now, we have a simple, but fully functioning, package. To ease development, I install the package in development mode via

pipenv install -e .

This way any change made to the package is directly reflected in the virtualenv without re-installing. Note, that the development package is also included in the Pipfile. This way anybody re-creating the environment will also install the local development version.

As is, the file is quite bare bones. To make the package also useful outside this setup, you should at least add requirements by specifying the install_requires=[...] options. For a detailed introduction, see the python packaging guide.

4. Notebooks and Modules

With the setup of environment, repository, and local package out of the way, we can finally add code. There are two places we will use for code, notebooks and our local package. Personally, I prefer to divide my code between the two roughly as follows:

  1. Analyses and high level docs inside notebooks
  2. Mature code inside the package

Typically, I will modify the code of the package, while I experiment inside the notebook. One caveat is, that the python module system will cache any module on import. To run the code that is found on the filesystem after changes, we can either restart the kernel or reload the module. To help with the latter, IPython includes the autoreload magic. Personally, I prefer to perform the reload manually, such that I have full control. To help, ipytest ships with a reload command that allows to pass a list of strings. In our case, we can run the following code, to ensure we always import the current version:

from ipytest_demo import ...

5. Setting up tests

Another important piece of my workflow are automated tests. There are many reasons why to test. For me it boils down to the fact, that any medium sized project will contain too many moving parts to keep everything in mind at the same time and automated tests give me a way to manage this complexity.

In our current setup, there are three different testing scenarios:

  1. Testing the local package, i.e., the typical tests,
  2. Testing inside the notebook as part of the development workflow, and finally
  3. Testing the integration of notebook and local package.

Let's start with the first scenario. For tests of standard python packages, I highly recommend pytest. It requires almost no setup. Just place all your tests inside python files starting with test_ and pytest will pick them up automatically. A typical test is a function, again, starting with test_ that raises an exception for any errors encountered. Just see the pytest getting started guide for details.

For the second scenario, testing inside the notebook, you can also use pytest. However, it requires some glue code, that I packaged as ipytest. For a detailed introduction, see my talk (slides, video). The short version of it is: Set ipytest up via

import ipytest
import ipytest.magics

ipytest.config.rewrite_asserts = True


and execute tests as in

%%run_pytest[clean] -qq

def test_foo():

Finally, testing the integration of notebooks and the local package remains. One application is to test for regressions and ensure that notebooks still work with the current state of the local package. For this task, I rely on nbval, a pytest plugin that allows to test notebooks. Personally, I only test that my notebooks still execute with the current package and do not care for the exact outputs. For such tests, execute nbval as in

pytest --nbval-lax notebooks/*.ipynb

Depending on the notebooks, the tests can take considerable time. Model fits in particular can have long runtimes. To deal with this issue, I typically only execute a single batch or use less samples when running inside tests. You can check whether this is the case by looking for the PYTEST_CURRENT_TEST environment variable. The next version of ipytest will include the function

def running_as_test():
    return os.environ.get("PYTEST_CURRENT_TEST") is not None

that allows you to write a typical model fit as in, y, epochs=20 if not running_as_test() else 1)

At the moment, I am also experimenting with improving my tool chain to use static tests for notebooks. The basic idea is to use a custom exporter to extract the source code of notebooks, while skipping magics and then executing static analysis tools, such as mypy and pyflakes. If these experiments pan out, I will detail the results in a separate post.

6. Automating common tasks

By now, my workflow encompasses many steps, that I execute regularly. Thankfully, pipenv also offers a way to include scripts inside the Pipfile and collect everything in one place. However, it is mostly geared for executing simple commands. To create more complex scripts, I use invoke as pipenv suggests.

Typically, I include the following commands:

  • format: to reformat my code using black
  • test: to run linters and execute my unit test-suite using pytest
  • docs: to update the documentation, if applicable

Additionally, I define a precommit command that executes all tasks in turn, so I do not forget any step before I commit my changes. In principle, it could be used as part of a precommit hook. For now, I prefer to run it manually.

Using invoke, the file will look like

from invoke import task

def precommit(c):

def format(c):"black src tests")

def test(c):"pytest tests")"pytest --nbval-lax notebooks/*.ipynb")

and my Pipfile will just forward to the corresponding invoke tasks

# Pipfile
precommit = "invoke precommit"
format = "invoke format"
test = "invoke test"

Now running pipenv run precommit will format the code and run all tests.

7. Reproducing the setup

Finally, let's talk about reproducibility. Thanks to pipenv, reproducing the setup is as easy as

pipenv sync --dev

These commands will fetch the code, setup a virtual environment and install all packages (including the local development package). This simplicity is also the main reason why have I taken a liking to pipenv. Note however, that this setup alone is not quite enough to ensure full reproducibility. Other things to look out for include:

  • random seeds that are not fixed. If you are interested in this topic, I can really recommend Valerio Maggio's PyCon.DE talk (video).
  • data that changes over time. My current approach is to treat data files as immutable, append-only, and document which files I used. An alternative could be tools, such as git-lfs.
  • specifics of the system setup, e.g., the cuda version as Valerio mentions. That the base system can have quite some influence, is one argument to use conda. Depending on your needs, it may be worthwhile to check out.
  • ...


In this blog post, I detailed how I setup my environment to effectively develop across notebooks and modules. Overall this setups works quite well for me, but I recognize that workflows are often a matter of personal preference and taste. The python ecosystem is rapidly evolving and can be hard to keep track of. If you have any tips or tricks on this issue, I would love to hear from you on twitter @c_prohm.

I also would be quite happy to hear from you, if you decide to try testing inside notebooks. In that case you may also want to give ipytest a try. Again contact me on twitter or open an issue on github if you run into issues or want to share feedback.

Finally, as a reference, I would like to highlight the tools I used throughout this article: git, pipenv, jupyter notebook pytest, ipytest, nbval, and nbdime. You can find an example, how everything fits together here on github.



NB: Using separate virtualenvs for kernel and server

Jupyter notebooks (and lab) have three components, than can roughly be visualized as

Frontend <-> Server <-> Kernel

The different components are

  • The frontend is the javascript running inside the browser.
  • The kernel is the python process that execute the code included in the notebooks.
  • The server is the component serving the files for the fronted and ensure that the frontend can communicate with the kernel.

A detailed description can be found in the jupyter docs. Importantly, jupyter does not require that kernel and server are running inside the same virtualenv. This fact can come in handy, if for example, the server is setup by somebody else or you are using nteract and there is no virtualenv to begin with. In this case, you can setup an independent virtualenv and make the kernel available via

python -m ipykernel install --user --name "misc-exp"

This command will install a short description into your home directory, such that any server / frontend tool can spawn kernels inside the newly created virtualenv. One detail to look out for: the web ui of jupyter needs to be reloaded for the new kernel to be visible.

NB: pipenv commands

Some useful pipenv commands:

# install the numpy package
pipenv install numpy

# install the pytets package as a dev requirement
pipenv install --dev pytest

# update and install the versions of all packages
pipenv lock --dev
pipenv sync --dev

# install all versions as locked (e.g., after a checkout)
pipenv sync --dev

# run a jupyter inside the virtualenv
pipenv run jupyter

# open a shell, inside the virtualenv
pipenv shell

For a more detailed description, see the pipenv documentation.

NB: git commands

Some useful git commands:

# create the repository
git init

# create a list of files to ignore
vim .gitignore

# add files
git add .

# commit files
git commit

# push to remote server
git push origin

# pull from remote server
git pull origin