From charlesreid1

No edit summary
Line 1: Line 1:
=Overview of Methods=
=Passing Command Line Arguments=
 
==Why==


First, why would you want to pass arguments into a TensorFlow model from the command line?
First, why would you want to pass arguments into a TensorFlow model from the command line?
Line 5: Line 7:
* Training models in Cloud ML Engine requires passing arguments via command line
* Training models in Cloud ML Engine requires passing arguments via command line
* Parameterizing models using command line arguments make them easier to run from other scripts and do hyperparameter optimization
* Parameterizing models using command line arguments make them easier to run from other scripts and do hyperparameter optimization
==How==


Couple of methods to pass arguments into a TensorFlow model:
Couple of methods to pass arguments into a TensorFlow model:
* gflags - this is the Google command line argument parsing library
* argparse - this is the most straightforward way; can parse arguments using argparse.ArgumentParser() and then pass all of the arguments into a dictionary. That can then be passed into the model run/train function call using the double splat operator
* tf.app.flags - built-in and convenient, but mainly used for keeping demos pithy
 
* gflags - this is the Google command line argument parsing library; more full-featured than argparse
* tf.app.flags - built-in and convenient, but mainly used for keeping demos pithy; mimics gflats behavior
 
* tf.app.run - combines argument parser with app runner, passes arguments on to main()
* tf.app.run and tf.app.flags in combination - parses arguments with tf.app.flags, and passes un-parsed arguments to app.run (and thus on to main())
* sys.argv - get raw command line arguments, split at spaces, manually parse
* sys.argv - get raw command line arguments, split at spaces, manually parse



Revision as of 05:18, 26 October 2017

Passing Command Line Arguments

Why

First, why would you want to pass arguments into a TensorFlow model from the command line?

  • While it's possible to use a notebook to train models, eventually the models will probably go into production
  • Training models in Cloud ML Engine requires passing arguments via command line
  • Parameterizing models using command line arguments make them easier to run from other scripts and do hyperparameter optimization

How

Couple of methods to pass arguments into a TensorFlow model:

  • argparse - this is the most straightforward way; can parse arguments using argparse.ArgumentParser() and then pass all of the arguments into a dictionary. That can then be passed into the model run/train function call using the double splat operator
  • gflags - this is the Google command line argument parsing library; more full-featured than argparse
  • tf.app.flags - built-in and convenient, but mainly used for keeping demos pithy; mimics gflats behavior
  • tf.app.run - combines argument parser with app runner, passes arguments on to main()
  • tf.app.run and tf.app.flags in combination - parses arguments with tf.app.flags, and passes un-parsed arguments to app.run (and thus on to main())
  • sys.argv - get raw command line arguments, split at spaces, manually parse

Cloud ML Engine

Using command line arguments is particularly important when bundling a TensorFlow model for Cloud ML Engine.

For an example of how to do this, see the GCP training-data-analyst repository, the taxicab fare prediction model:

Here's the relevant code in the task.py file:

if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  # Input Arguments
  parser.add_argument(
      '--train_data_paths',
      help='GCS or local path to training data',
      required=True
  )
  parser.add_argument(
      '--num_epochs',
      help="""\
      Maximum number of training data epochs on which to train.
      If both --max-steps and --num-epochs are specified,
      the training job will run for --max-steps or --num-epochs,
      whichever occurs first. If unspecified will run for --max-steps.\
      """,
      type=int,
)

etc... Several other arguments are read from the command line. Next, these are added to a dict:

    args = parser.parse_args()
    arguments = args.__dict__

Finally, these are passed to the model's training run using the double-splat:

    # Run the training job
    learn_runner.run(generate_experiment_fn(**arguments), output_dir)

These arguments are passed to the generate_experiment_fn, which in turn passes them to the model object that is created:

def generate_experiment_fn(train_data_paths,
                           eval_data_paths,
                           format,
                           num_epochs=None,
                           train_batch_size=512,
                           eval_batch_size=512,
                           hidden_units=None,
                           **experiment_args):


  def _experiment_fn(output_dir):
    input_fn = model.generate_csv_input_fn
    train_input = input_fn(
        train_data_paths, num_epochs=num_epochs, batch_size=train_batch_size)
    eval_input = input_fn(
        eval_data_paths, batch_size=eval_batch_size, mode=tf.contrib.learn.ModeKeys.EVAL)
    return Experiment(
        model.build_estimator(
            output_dir,
            hidden_units=hidden_units
        ),
        train_input_fn=train_input,
        eval_input_fn=eval_input,
        export_strategies=[saved_model_export_utils.make_export_strategy(
            model.serving_input_fn,
            default_output_alternative_key=None,
            exports_to_keep=1
        )],
        eval_metrics=model.get_eval_metrics(),
        #min_eval_frequency = 1000,  # change this to speed up training on large datasets
        **experiment_args
    )
  return _experiment_fn

Note that this example is pretty complicated - as are many of the TensorFlow examples - because there are so many nested function calls and so many parameters being passed around, from one function call to another.


Flags