GCDEC/Deploying Tensorflow/Notes: Difference between revisions
From charlesreid1
| Line 352: | Line 352: | ||
* Deploy the model to the cloud | * Deploy the model to the cloud | ||
* Call the model to make predictions | * Call the model to make predictions | ||
=References= | =References= | ||
Revision as of 07:16, 18 October 2017
Deploying TensorFlow Models
Module 3: Scaling Machine Learning Models with Cloud ML Engine
Effective machine learning requires:
- Larger data sets
- More feature engineering
- More complicated model architectures
Refactor the current taxi cab fare prediction machine learning model:
- Read out of memory data
- Make it easy to add new input features
- Make the model evaluate as part of training
Scaling TensorFlow Models
Once you have a working TensorFlow model, you can scale it up to more machines and more data
Taking the written model and scaling it out to more machines is essentially just scripting via gcloud commands
Scaling the Training Process
Most machine learning frameworks can handle toy problems and in-memory data sets
But if data size becomes much larger, need to be able to split data into batches and run model on many machines (batching and distribution are important)
Also doing transformations:
- Pre-processing (transformation, cropping, de-colorize, etc.)
- Feature creation (combine features, eliminate features, transform features)
- Train model (also, hyper-parameter tuning)
The need for the cloud again - if data set is large, need to do these transformations in the cloud, across many machines. Same with hyperparameter tuning - want to explore different model architectures, at scale.
Scaling the Prediction Process
When using the trained model, you still need scaling. To make predictions, you turn a model into a microservice (web application). TensorFlow Model - fit your estimator - then, to predict, take your estimator and call predict() on it (via Python).
- Are all clients ("customers") in the code able to run in Python?
- Will they all have access to the directory needed to construct the estimator object?
- Will they know the feature columns you used to train the model?
- The answer to all of these is, NO!
- Deploy model as a microservice to serve as a layer between your client and the details of your machine learning model
Microservice architecture:
- Need to shield clients from the details of the machine learning prediction details (including programming language, features used, etc)
- If clients need a prediction from the model, they bundle everything into a REST API call (with all input variables needed by model)
- Web service will take all input variables, convert them into tensors, send them to TensorFlow model, get results back, and convert them back to an API response (HTTP)
- If you have millions of clients, and lots of requests coming in simultaneously, need to have a web service that can support this throughput
- Weak link is the model evaluation step - this also needs to scale
Problems in training and problems in prediction are different.
Training problems: scaling out data and training process to more machines.
Prediction problems: scaling up prediction engine to handle high throughput and lots of clients
First generation TPU - primarily around prediction (inference) and doing prediction at scale - predicting/evaluating as fast as possible to handle user requests
Cloud ML Engine Workflow
Cloud ML Engine does both the prediction and training scaling. Focused on helping TensorFlow models scale up.
- Start with CSV files
- Explore datasets in Datalab using Pandas, matplotlib, etc.
- Do transformations (preprocessing, feature creation, etc.) in Apache Beam (can handle batch or streaming data - that's the intent - convert everything to a Dataflow pipeline so that you can seamlessly switch from batch to streaming without changing your transformation pipeline into ML Engine)
Dataflow workflow:
- Work on transformations using a local Apache Beam runner, ensure everything is working
- Scale it up to larger data sets by using a Dataflow runner
Cloud ML workflow:
- Work on neural network locally using TensorFlow/notebooks/etc., ensure everything is working
- Scale it up to execute TF code on GCP using Cloud ML Engine
Packaging TensorFlow Models as Python Modules for Training
To scale up a TensorFlow model to run on Cloud ML Engine, need to package the model up as a Python module.
We then submit a TensorFlow code by submitting this Python module. The task.py and model.py parts are the key here.
taxifare/ taxifare/PKG-INFO taxifare/setup.cfg taxifare/setup.py taxifare/trainer/ taxifare/trainer/__init__.py taxifare/trainer/task.py taxifare/trainer/model.py taxifare/trainer.egg-info/ taxifare/trainer.egg-info/dependency_links.txt taxifare/trainer.egg-info/PKG-INFO taxifare/trainer.egg-info/SOURCES.txt taxifare/trainer.egg-info/top_level.txt
The TensorFlow code we wrote goes into task.py and model.py (mostly model.py). When we tar up the directory structure above, we get a Python module.
What is in task.py
task.py:
- contains a main method
- parses command-line parameters
- uses command line parameters to run the model
Example task.py:
Experiment(
model.build_estimator(
output_dir,
embedding_size = embedding_size,
hidden_units = hidden_units
),
train_input_fn = model.generate_csv_input_fn( train_data_paths, ... ),
eval_input_fn = model.generate_csv_input_fn( eval_data_paths, ... ),
eval_metrics = model.get_eval_metrics(),
)
(Note that these refer to functions that must be defined in model.py, which we'll cover in a moment)
Then, use argument parsing to get train_data_paths, for example:
parser.add_argument( '--train_data_paths', required=True ) parser.add_argument( '--num_epochs', ... ) # etc...
This makes code executable as a program, and enables passing information into the program via command line arguments.
What is in model.py
model.py:
- All code from previous chapter (estimator API, etc.) goes into model.py
We need to have a function that returns a function. This will take a filename as an argument (passed in via task.py), and then extract from it the TensorFlow stuff that's needed.
def generate_csv_input_fn( filename, num_epochs = None, ... ):
def _input_fn():
input_file_names = tf.train.match_filenames_once(filename)
filename_queue = tf.train.string_input_producer(
input_file_names,
num_epochs = num_epochs,
shuffle = True
)
reader = tf.TextLineReader()
_, value = reader.read_up_to(filename_queue, num_records = batch_size)
value_column = tf.expand_dims(value, -1)
columns = tf.decode_csv( value_column, record_defaults = DEFAULTS)
features = dict(zip(CSV_COLUMNS, columns))
label = features.pop(LABEL_COLUMN)
return features, label
return _input_fn
Verifying the Package
To verify that the model package runs as expected, you can run the following test:
export $PYTHONPATH=${PYTHONPATH}:/path/to/taxifare
python -m trainer.task \
--train_data_paths="/path/to/dataset/taxi-train*" \
--eval_data_paths=/path/to/dataset/taxi-valid.csv \
--output_dir=/path/to/outputdir \
--num_epochs=10 \
--job-dir=/tmp
This simulates the way that the model is run in the cloud.
- Python path variable tells python where to look for modules
- The -m flag runs a module called trainer.task
- The argparse settings pass the path information from the command line on to the program
Now that you know it works, how do you scale it up? Use gcloud command.
Running Packaged Model in the Cloud
Now you can use the gcloud command to submit the model - either locally, or in the cloud.
To run it locally, use "local train":
gcloud ml-engine local train \
--module-name=trainer.task \
--package-path=/path/to/taxifare/trainer \
-- \
--train_data_paths ... <the rest looks like it did above>
We are running this locally, passing it local directories to the package path, and local directories for the training data, &c.
To run the training task in the cloud, use "jobs submit":
gcloud ml-engine jobs submit \
training $JOBNAME \
region $REGION \
--module-name=trainer.task \
--job-dir=$OUTDIR \
--staging-bucket=gs://$BUCKET \
--scale-tier=BASIC \
--train_data_paths ... <the rest looks like it did above>
Does the following:
- Submits a training job in the cloud
- Specifies the region (same region as where your data lives)
- Specify module name for job/model
- Specify bucket location to put temporary files
- Scale tier specifies the scale of the resources used (BASIC/STANDARD/PREMIUM/GPU/etc...)
The scale tier determines the cost.
The workflow, again, is:
- Try out the job locally, and pass it local module name/location
- Then submit it to the cloud
We covered training, but what about prediction?
Cloud ML Engine for Prediction
For the training task, we had the following task.py:
Experiment(
model.build_estimator(
output_dir,
embedding_size = embedding_size,
hidden_units = hidden_units
),
train_input_fn = model.generate_csv_input_fn( train_data_paths, ... ),
eval_input_fn = model.generate_csv_input_fn( eval_data_paths, ... ),
eval_metrics = model.get_eval_metrics(),
)
For prediction, we want to make slight modifications:
Experiment(
model.build_estimator(
output_dir,
embedding_size = embedding_size,
hidden_units = hidden_units
),
train_input_fn = model.generate_csv_input_fn( train_data_paths, ... ),
eval_input_fn = model.generate_csv_input_fn( eval_data_paths, ... ),
export_strategies = [saved_model_export_utils.make_export_strategy(
model.serving_input_fn,
default_output_alternative_key = None,
exports_to_keep = 1
)],
eval_metrics = model.get_eval_metrics(),
)
This keeps 1 export (the best one). This also requires us to define a model_serving_input_fn(), which is the function that parses the JSON file that the client is sending when it requests the model be evaluated. It creates all of the input features that the model expects.
Example: this creates placeholders for each input column, and each column is a float32:
def serving_input_fn():
feature_placeholders = {
column.name : tf.placeholder(tf.float32, [None]) for column in INPUT_COLUMNS
}
(This is just an example, could have virtually any kind of types for your input data.)
Once you've done that, it's time to deploy the trained model to Google Cloud Platform:
- Can deploy a locally-trained, locally-built model
- can deploy a trained model that is somewhere on a Google Cloud Storage bucket
Here is an example of submitting a model that is located in a Cloud Storage bucket:
MODEL_NAME="taxifare"
MODEL_VERSION="v1"
MODEL_LOCATION="gs://${BUCKET}/taxifare/smallinput/taxi_trained/export/Servo/..."
# Create a model
gcloud ml-engine models create ${MODEL_NAME} \
--regions $REGION
# Create a new version of this model and where it lives
gcloud ml-engine versions create ${MODEL_VERSION} \
--model ${MODEL_NAME} \
--origin ${MODEL_LOCATION}
Creating multiple versions allows you to do A/B testing... send 80% of your traffic to version 1, 20% of your traffic to version 2, and gradually scale up one model version or the other.
Client interaction with model predictions
Now we cover how the client interacts with the model. Recall from above, client is sending JSON requests that go to the model. These calls are made via REST calls.
JSON request containing inputs (in a structure called request_data):
# Get credentials for user to make API calls
credentials = GoogleCredentials.get_application_default()
api = discover.build('ml', 'v1beta1',
credentials = credentials,
discoveryServiceUrl = 'https://storage.googleapis.com/cloud-ml/discover/ml_v1beta1_discovery.json'
)
# Set the JSON file with model inputs
request_data = [ { 'pickup_longitude' : -73.800001,
'pickup_latitude' : 40.700001,
'dropoff_longitude': -73.980001,
'dropoff_latitude' : 40.730001,
'passenger_count' : 2
}]
# Now assemble the URL to which to send the model inputs:
# Set the following information:
# - name of the project
# - name of the model
# - name of the version
parent = 'projects/%s/models/%s/versions/%s' % ( 'cloud-training-demos', 'taxifare', 'v1' )
# Make the API request (call the predict function)
response = api.projects().predict( body = { 'instances' : request_data,
name = parent
}).execute()
Recall that we specified the model and version number when we ran gcloud ml-engine models create and gcloud ml-engine versions create:
MODEL_NAME="taxifare"
MODEL_VERSION="v1"
MODEL_LOCATION="gs://${BUCKET}/taxifare/smallinput/taxi_trained/export/Servo/..."
Scaling with Cloud Machine Learning Laboratory
Use a single-region bucket for machine learning training inputs and outputs - enables consistency (fast reading/writing from multiple threads)
The lab will accomplish the following:
- Package a TensorFlow model
- Run the training locally
- Run the training on the cloud
- Deploy the model to the cloud
- Call the model to make predictions