While working on my deep learning project, I hacked together a couple of simple
support methods for IPython that, at least for me, greatly increased its
usefulness for iterative optimization.
In this post I will concentrate on IPython, while I defer the discussion
about the model and its design to a future post.
At the end of this post you'll find a
teaser^{[1]} and I also uploaded the
HTML file and the notebook of the current state.

My goal is to train a deep autoencoder^{[2]}
on images of multiple digits.
This way I hope to generate features than can be used by a second model to
recognize the sequence of digits shown in the images.
One problem with these kinds of models and optimization problems in general is
that the parameters need some tweaking:
pick the learning rate to high the model diverges, pick the learning rate to
low you'll wait forever.
Similar trade-offs can also be observed for the other parameters and, as usual,
there are a lot.
Therefore, being able to experiment with the parameter values and inspect the
model while it is being optimized is a great advantage.

To keep IPython responsive while optimizing the model I turned to threads. In contrast to the multiprocessing option, it keeps the model and the notebook in the same process and thereby does not require to constantly copy data. Since python does not support an API to stop threads, additional logic is required, for example something along the lines of

```
class Optimizer(threading.Thread):
def run():
model = construct_model()
self.is_running = True
for iteration in range(100):
update_and_evaluate(model)
if not self.is_running:
return
```

This way the training can be easily aborted by setting the is_running attribute to False. To allow for stopping and resuming of the optimization, additional indirections are required. My idea was to use generators as cooperative coroutines. After each yield the is_running flag is checked. When the optimization is stopped, it can easily be resumed by calling the next method of the generator. The final code posted to github implements all of these features, complete with shiny, clickable buttons to control the execution. It can be used as

```
from parallel_coroutine import ParallelCoroutine
training = ParallelCoroutine()
@training.execute
def optimize(self):
model = construct_model()
for iteration in range(100):
yield update_and_evaluate(model)
training.start()
```

To control the training by graphical buttons, one just has to evaluate the
`controls`

attribute of the training object:

Finally, only one problem remains, that cannot be circumvented: the
global interpreter lock.
I implemented the autoencoder using the Theano package developed in the
group of Yoshua Bengio.
Unfortunately Theano does *not* release the GIL.
Since only one python thread can execute at a time, the notebook feels a bit
sluggish while the optimization is running.
Still overall this technique greatly simplified my work flow and sped up
experimentation quite a lot.

[1] The model is a four layer denoising autoencoder with tied weights and soft rectified linear units for all layers but the feature layer and the reconstruction layer. The models tries to learn an encoding of the inputs with 1568 units into a feature vector with 400 units. In between two layers with 2500 units each are used as intermediate stages. The training proceeds over 15 epochs with a learning rate of 0.001 for the initial 10 epochs and of 0.0001 for the last 5 epochs. While training the cross-entropy reconstruction loss is minimized via stochastic gradient descent and back-propagation. In each epoch one million training examples are processed in batches of 20 samples at a time. Additionally, the input is corrupted with 50 % salt-and-pepper noise to train robust features and a dropout regularizer with a rate of 25 % is used to prevent over-fitting.

[2] Well as I learned three layers are only weakly deep: in a recent blogpost, Ilya Sutskever defined (large) deep neural networks as having 10 - 20 layers. Actually, I should consider a deeper model since they become exponentially more powerful. However without a GPU larger models are somewhat frightening from a runtime perspective.