Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Over the last couple of days, I’ve been fettling the build scripts for the TM351 VM, which typically uses vagrant
to build a VirtualBox VM from a set of shell scripts, so they can be used to build a single Docker container that runs all the TM351 services, specifically Jupyter notebooks, OpenRefine, PostgreSQL and MongoDB.
Docker containers are typically constructed to a run a single service, with compositions of containers wired together using Docker Compose to create applications that deliver, or rely on, more than one running service. For example, in a previous post (Setting up a Containerised Desktop API server (MySQL + Apache / PHP 5) for the ergast Motor Racing Data API) I showed how to set up a couple of containers to work together, one running a MySQL database server, the other an http service that provided an API to the database.
So how to run multiple services in the same container? Docs on the Docker website suggest using supervisord
to run multiple services in a single container, so here’s a fragment on how I’ve done that from my TM351 build.
To begin with, I’ve built the container up as a tiered set of containers, in a similar way to the way the stack of opinionated Jupyter notebook Docker containers are constructed:
#Define a stub to identify the images in this image stack IMAGESTUB=psychemedia/tm361testm # minimal ## Define a minimal container, eg a basic Linux container ## using whatever flavour of Linux we prefer docker build --rm -t ${IMAGESTUB}-minimal-test ./minimal # base ## The base container installs core packages ## The intention is to define a common build environment ## populated with packages likely to be common to many courses docker build --rm --build-arg BASE=${IMAGESTUB}-minimal-test -t ${IMAGESTUB}-base-test ./base #...
One of the things I’ve done to try to generalise the build steps is allow the name a base container to be used to bootstrap a new one by passing the name of the base image in via an optional variable (in the above case, --build-arg BASE=${IMAGESTUB}-minimal-test
). Each Dockerfile
in a build step directory uses the following construction to work out which image to use as the FROM
basis:
#Set ARG values using --build-arg = #Each ARG value can also have a default value ARG BASE=psychemedia/ou-tm351-base-test FROM ${BASE}
Using the same approach, I have used separate build tiers for the following components:
jupyter base
: minimal Jupyter notebook install;jupyter custom
: add some customisation onto a pre-existing Jupyter notebook install;openrefine
: add the OpenRefine application; (note, we could just useBASE=ubuntu
to create this a simple, standalone OpenRefine container);postgres
: create a seeded PostgreSQL database; note, this could be split into two: a base postgres tier and then a customisation that adds users, creates and seed databases etc;mongodb
: add in a seeded mongo database; again, the seeding could be added as an extra tier on a minimal database tier;topup
: a tier to add in anything I’ve missed without having to go back to rebuild from an earlier step…
The intention behind splitting out these tiers is that we might want to have a battle hardened OU postgres tier, for example, that could be shared between different courses. Alternatively, we might want to have tiers offering customisations for specific presentations of a course, whilst reusing several other fixed tiers intended to last out the life of the course.
By the by, it can be quite handy to poke inside an image once you’ve created it to check that everything is in the right place:
#Explore inside animage by entering it with a shell command docker run -it --entrypoint=/bin/bash psychemedia/ou-tm351-jupyter-base-test -i
Once the services are in place, I add a final layer to the container that ensures supervisord
is available and set up with an appropriate supervisord.conf
configuration file:
##Dockerfile #Final tier Dockerfile ARG BASE=psychemedia/testpieces FROM ${BASE} USER root RUN apt-get update && apt-get install -y supervisor RUN mkdir -p /openrefine_projects && chown oustudent:100 /openrefine_projects VOLUME /openrefine_projects RUN mkdir -p /notebooks && chown oustudent:100 /notebooks VOLUME /notebooks RUN mkdir -p /var/log/supervisor COPY monolithic_container_supervisord.conf /etc/supervisor/conf.d/supervisord.conf EXPOSE 3334 EXPOSE 8888 CMD ["/usr/bin/supervisord"]
The supervisord.conf
file is defined as follows:
##supervisord.conf ##We can check running processes under supervisord with: supervisorctl [supervisord] nodaemon=true logfile=/dev/stdout loglevel=trace logfile_maxbytes=0 #The HOME envt needs setting to the correct USER #otherwise jupyter throws: [Errno 13] Permission denied: '/root/.local' #https://github.com/jupyter/notebook/issues/1719 environment=HOME=/home/oustudent [program:jupyternotebook] #Note the auth is a bit ropey on this atm! command=/usr/local/bin/jupyter notebook --port=8888 --ip=0.0.0.0 --y --log-level=WARN --no-browser --allow-root --NotebookApp.password= --NotebookApp.token= #The directory we want to start in #(replaces jupyter notebook parameter: --notebook-dir=/notebooks) directory=/notebooks autostart=true autorestart=true startsecs=5 user=oustudent stdout_logfile=NONE stderr_logfile=NONE [program:postgresql] command=/usr/lib/postgresql/9.5/bin/postgres -D /var/lib/postgresql/9.5/main -c config_file=/etc/postgresql/9.5/main/postgresql.conf user=postgres autostart=true autorestart=true startsecs=5 [program:mongodb] command=/usr/bin/mongod --dbpath=/var/lib/mongodb --port=27351 user=mongodb autostart=true autorestart=true startsecs=5 [program:openrefine] command=/opt/openrefine-3.0-beta/refine -p 3334 -i 0.0.0.0 -d /vagrant/openrefine_projects user=oustudent autostart=true autorestart=true startsecs=5 stdout_logfile=NONE stderr_logfile=NONE
One thing I need to do better is to find a way to stage the construction of the supervisord.conf
file, bearing in mind that multiple tiers may relate to the same servicel for example, I have a jupyter-base
tier to create a minimal Jupyter notebook server and then a jupyter-base-custom
tier that adds in specific customisations, such as branding and course related notebook extensions.
When the final container is built, the supervisord
command is run and the multiple services started.
One other thing to note: we’re hoping to run TM351 environments on an internal OpenStack cluster. The current cluster only allows students to expose a single port, and port 80 at that, from the VM (IP addresses are in scant supply, and network security lockdowns are in place all over the place). The current VM exposes at least two http
services: Jupyter notebooks and OpenRefine, so we need a proxy in place if we are to expose them both via a single port. Helpfully, the nbserverproxy
Jupyter extension (as described in Exposing Multiple Services Via a Single http Port Using Jupyter nbserverproxy), allows us to do just that. One thing to note, though – I had to enable it via the same user that launches the notebook server in the suoervisord.conf
settings:
##Dockerfile fragment RUN $PIP install nbserverproxy USER oustudent RUN jupyter serverextension enable --py nbserverproxy USER root
To run the VM, I can call something like:
docker run -p 8899:8888 -d psychemedia/tm351dockermonotest
and then to access the additional services, I can browse to e.g. localhost:8899/proxy/3334/
to see the OpenRefine application.
PS in case you’re wondering why I syndicated this through RBloggers too, the same recipe will work if you’re using Jupyter notebooks with an R kernel, rather than the default IPython one.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.