Finally I finished up the RBM implementation and squashed the remaining bugs. Running the RBM gives reasonable samples and the learned filters seem ok. So everything is working just fine, albeit a bit slow due to the use of pure java linear algebra. Luckily most of the algorithms are implemented using the BLAS interface so it should be rather simple to swap out those parts for a faster implementation.
I really would like to have the option of switching the underlying implementation when starting the program, which UJMP seems to support. However it does not support the full BLAS API, so I cannot multiply with the transpose or do a rank-one update. All other libraries I found seem to focus on running with a fixed BLAS implementation mostly on the CPU. However long-term I will try to switch running in OpenCL to get the most performance even when running in Java. Well, I would really hate to roll my own matrix library, especially considering there are quite so many out there, but this seems to be the best option at the moment.
On a more happy note, I got Guice working just the way I wanted it to. I rolled my own little JSON configuration library which allows me to swap every little piece of my program using configuration files. After cleaning up a bit more, I will upload it to github and write a short bit about it.