Docker/Pods/Deep Learning: Difference between revisions
From charlesreid1
(→GPUs) |
|||
| Line 65: | Line 65: | ||
=Running= | =Running= | ||
==Locally== | ==Locally== | ||
| Line 87: | Line 85: | ||
$ docker run -it -p 8888:8888 waleedka/modern-deep-learning | $ docker run -it -p 8888:8888 waleedka/modern-deep-learning | ||
</pre> | </pre> | ||
=Basics= | =Basics= | ||
Revision as of 21:23, 29 April 2017
Notes on a Docker Pod for deep learning.
Overview of Docker Deep Learning
We are looking for Docker images that can handle a couple of different deep learning technologies:
- Python 3
- Jupyter
- Numpy, scipy, matplotlib, pandas
- Scikit Learn/Scikit Image
- Tensorflow
- OpenCV
- Keras
It would also be nice to be ready to use a GPU if it is available...
This may require a single Docker container, or it might require the use of multiple containers. Either way, we'll call it a Docker pod - a collection of related containers.
Setting Up the Docker Pod
To get various containers set up, we can use a container created by Github user waleedka: https://github.com/waleedka/modern-deep-learning-docker
This Github repo provides a Dockerfile that installs pretty much every item we wanted from the list above, plus a few other things (Java).
Using a CPU
If you're just using a CPU, you can install Docker on your platform of choice, and then use the docker run command to run the deep learning container image:
docker run -it -p 8888:8888 -p 6006:6006 -v ~/:/host waleedka/modern-deep-learning
Using a GPU
Using a GPU is a little more complicated, since Docker containers have no inherent way of accessing GPU hardware from onboard the container.
Nvidia-docker provides a CUDA image and a docker command line wrapper to allow the GPUs to be accessed by a Docker container when it is launched.
Here's what running a hello world script looks like with nvidia-docker:
nvidia-docker run --rm hello-world
Here are the steps that Nvidia suggests for any nvidia-docker project [1]:
1. Set up and explore the development environment inside a container.
2. Build the application in the container.
3. Deploy the container in multiple environments.
To get nvidia-docker you will need to sign up with Nvidia, which you can do here: https://devblogs.nvidia.com/parallelforall/nvidia-docker-gpu-server-application-deployment-made-easy/
The Platform
CPU-based platform
We can definitely use CPUs to do deep learning - this is the easiest option, and can be done at home. The only problem is that it will take a while.
If you are going with CPUs, you can rent out a node from Linode or DigitalOcean, instead of a behemoth like Google or Amazon.
GPU-based platform
Once it comes time to use GPUs, you will probably want to provision machines from AWS or Google Cloud that have GPUs built in so that you can leverage that hardware.
Running
Locally
Let's start by covering how to get the Docker deep learning pod up and running locally, if we are using CPUs to do deep learning.
Start by installing Docker: Docker/Installing
Deep Learning Container: No Modifications/Extras
If we want to download/run the latest deep learning container image from waleedka without modifying the Dockerfile or adding any additional software, we can get the image from Dockerhub:
docker pull waleedka/modern-deep-learning
Now we can run it, but this will not save anything, and each time we close the machine all the notebooks will disappear.
$ docker run -it -p 8888:8888 waleedka/modern-deep-learning
Basics
Running the Deep Learning Container
Let's start with how we get this deep learning docker container up and running.
Start by installing Docker: Docker/Installing
Next, this deep learning container can run a Jupyter notebook server, which runs on port 8888 by default, so we'll pass the container's port 8888 through to the host machine's port 8888:
>
This is great, but unfortunately any changes we make or notebooks we create will disappear with our container, so we'll need to figure out data volumes.
For the time being, let's start by testing out the container and making sure the software components work.
Then we'll figure out a schema for data volumes, and how we get data into and out of our deep learning container.
Testing it out
To take this for a test drive, run the above command. This will give you a bash terminal on the docker container, where we can run a Jupyter notebook:
$ docker run -it -p 8888:8888 waleedka/modern-deep-learning root@a944863bc1e6:~# jupyter notebook
Now on the host machine, we can navigate to localhost:8888 and see a Jupyter notebook server up and running. This is exposing the container's file system and any notebooks running in the container. This container runs Python 3 only.
Create a new Python notebook, and try importing a few libraries:
import numpy import scipy import sklearn import theano import tensorflow import pandas import matplotlib import keras
Data Volumes Strategy
Let's walk through a volumes strategy for deep learning models using Docker. The strategy we use depends on whether we're training deep learning models using data, or running deep learning models to make predictions.
Training
If you are training deep learning models using docker containers, you want to be able to train one or more models on a given data set. That's the point of creating your data volume container - you can try different neural networks by spinning up different containers.
Note that training data may come from a variety of sources, but here we'll treat the training data as some files on disk.
Workflow
Here are a few things we know about the workflow of training deep learning models in parallel:
- Each container will be loading up the same data set for training; if containers load up different training sets, they should be getting a data volume from a different container.
- Each container will be creating a unique model and will need to dump this model somewhere. (These models may be unique because they focus on different chunks of data, or because they are creating models using different X's and Y's, or because they are trying different strategies or architectures or model parameters.)
Input
The input is the training data.
The training data volume should be a single volume mounted read-only from the data volume container.
Output
The output is the neural network or resulting model, which can be handled a few ways.
Easiest way is to mount a host directory in the container, and dump completed models into that directory.
Another method is to run a database, possibly another container, that will store the resulting models (in whatever format they are exported...???).
Yet another possibility is to have a persistent drive in a data volume container, and each other container mounts and shares that single volume. This seems complicated and not efficient, though, so mounting a host directory is probably easiest.
Running/Predicting
Once you've used your training procedure to test out a whole bunch of configurations, you will decide on one or a few, and will now want to create a different workflow for putting those machines/models into production.
The Workflow
Here's what we know about the workflow:
- The outcome of the training process is a big pile of model files. The outcome of the expert review process is a slightly smaller pile of model files. These should be put into a data volume container to be loaded up.
- We use a data volume container to load model files into the Docker container and get them loaded into Python/whatevs. (Details depend heavily on implementation.)
- When running in prediction mode, docker container loads trained models from data volume container. (Multiple instances can/will share models. Similar to above, different models go in different containers.)
- When running in prediction mode, docker container will need to accept data coming in (X) and send predictions out (Y). Each container will be seeing different data sets (X) and generating different predictions (Y).
The unique X's and Y's coming from and going to the containers may happen via an API and a networking protocol, or they may be coming from and going to files on disk.
Trained Models Input
The input is the ready-to-go model that took hours and hours to train.
(Once the models are trained, following the training step above, there is probably an expert review step in which the final trained models are selected. They should be loaded into a trained model data volume container at that time.)
This should consist of a model file, or a pile of model files. Each docker container is provided exactly the same trained model in the (read-only) data volume container.
Input Data Input
May arrive via file, or may arrive via API (e.g., HTTP or JSON)
Predictions Output
May be sent via. file, or may be sent via API (e.g., HTTP or JSON)
Flags
| docker notes on the virtual microservice container platform
Installing the docker platform: Docker/Installing Docker Hello World: Docker/Hello World
Creating Docker Containers: Getting docker containers from docker hub: Docker/Dockerhub Creating docker containers with dockerfiles: Docker/Dockerfiles Managing Dockerfiles using git: Docker/Dockerfiles/Git Setting up Python virtualenv in container: Docker/Virtualenv
Running docker containers: Docker/Basics Dealing with volumes in Docker images: Docker/Volumes Removing Docker images: Docker/Removing Images Rsync Docker Container: Docker/Rsync
Networking with Docker Containers:
|
| docker pods pods are groups of docker containers that travel together
Docker pods are collections of Docker containers that are intended to run in concert for various applications.
Wireless Sensor Data Acquisition Pod The wireless sensor data acquisition pod deploys containers This pod uses the following technologies: Stunnel · Rsync · Apache · MongoDB · Python · Jupyter (numerical Python stack)
Deep Learning Pod This pod utilizes the following technologies: Python · Sklearn · Jupyter (numerical Python stack) · Keras · TensorFlow
|