Part 4: Defining a Model

TensorFlow Programmer's Guide - Estimator
TensorFlow Extend - Estimators

Now for the part you’ve all been waiting for - defining a model.

Note that this is a very baseline model working with an extremely limited amount of training data so don’t expect to see state of the art results.

To build the model we are going to use one of TensorFlow’s new (as of version 1.1) Estimator API. Estimators accept as arguments the things we do want to handle ourselves:

and abstracts away the things that should be automatic:

To create an Estimator we follow these steps:

  1. Define the Estimator object
  2. Construct the model’s logic in a function model_fn
  3. Set parameters such as model_dir, config, and params

Tip: I like to keep this model code in a separate python file, let’s call it model.py, that is imported from the main training script. Since research usually involves trying several architecture variants, separate model files help keep all the subtle differences in order. Later, we will see how to copy over the exact code that is used for each training run to make completely reproducible results.

To follow along with the running example, create a new file called model.py.

Defining the Estimator object

Defining the Estimator object is simple once the model_fn, model_dir, config, and params are defined - we will do this below.

Tip: When an Estimator is initialized, it looks in model_dir and uses the latest saved checkpoint if it exists. If there are no saved checkpoints in model_dir a new model will be instantiated. If model_dir is None and also not defined in config a temporary directory will be used. Re-loading a trained model is as simple as passing in model_dir as the path to your saved model. The most confusing thing is how to retreive the model_fn later on to load back in together. We will see how to do this below.

################################
###   Inside code/train.py   ###
################################

estimator = tf.estimator.Estimator(model_fn=model_fn, model_dir=model_dir, config=config, params=params)

Constructing the Model Function

Now, let’s define the core logic of the model, model_fn.

This function is called when:

For all projects we do this in a function with the following signature:

'''
Defines the model function passed into tf.estimator.  
This function defines the computational logic for the model.

Implementation:
    1. Define the model's computations with TensorFlow operations
    2. Generate predictions and return a prediction EstimatorSpec
    3. Define the loss function for training and evaluation
    4. Define the training operation and optimizer
    5. Return loss, train_op, eval_metric_ops in an EstimatorSpec

    Inputs:
        features: A dict containing the features passed to the model via input_fn
        labels: A Tensor containing the labels passed to the model via input_fn
        mode: One of the following tf.estimator.ModeKeys string values indicating
               the context in which the model_fn was invoked 
                  - tf.estimator.ModeKeys.TRAIN ---> model.train()
                  - tf.estimator.ModeKeys.EVAL, ---> model.evaluate()
                  - tf.estimator.ModeKeys.PREDICT -> model.predict()

    Outputs:
        tf.EstimatorSpec that defines the model in different modes.
'''
def model_fn(features, labels, mode, params):
    # 1. Define model structure
    
    # ...
    # convolutions, denses, and batch norms, oh my!
    # ...

    # 2. Generate predictions
    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {'output_str': output_var} # alter this dictionary for your model
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
    
    # 3. Define the loss functions
    loss = ...
    
    # 3.1 Additional metrics for monitoring
    eval_metric_ops = {"rmse": tf.metrics.root_mean_squared_error(
          tf.cast(labels, tf.float64), output)}
    
    # 4. Define optimizer
    optimizer = tf.train.AdamOptimizer(learning_rate=params['learning_rate'])
    train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
    
    # 5. Return training/evaluation EstimatorSpec
    return EstimatorSpec(mode, predictions, loss, train_op, eval_metric_ops)
    

Warning: The main gotchya in here is making sure to put your logic in the correct order. In predict mode you don’t have access to the label so you should make predictions first and then return. Then, after where you will return in predict mode, you can define the loss function and optimizer.

Here is a simple convolutional neural network for our running example:

################################
###   Inside code/model.py   ###
################################

def model_fn(features, labels, mode, params):
    # 1. Define model structure
    for l in range(params.num_layers):
        lparams = params.layers[l]
        if l == 0:
            h = features['image']
        elif lparams['type'] == 'fc' and len(h.get_shape().as_list()) != 2:
            h = tf.contrib.layers.flatten(h)
        if lparams['type'] == 'conv':
            h = tf.contrib.layers.conv2d(h, lparams['num_outputs'], lparams['kernel_size'], lparams['stride'], activation_fn=lparams['activation'], weights_regularizer=lparams['regularizer'])
        elif lparams['type'] == 'pool':
            h = tf.contrib.layers.max_pool2d(h, lparams['kernel_size'], lparams['stride'])
        elif lparams['type'] == 'fc':
            h = tf.contrib.layers.fully_connected(h, lparams['num_outputs'], activation_fn=lparams['activation'])

    # 2. Generate predictions
    if mode == tf.estimator.ModeKeys.PREDICT:
        predictions = {'output': h}
        return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)
    
    # 3. Define the loss functions
    loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=labels, logits=h))
    
    # 3.1 Additional metrics for monitoring
    eval_metric_ops = {"accuracy": tf.metrics.accuracy(
          labels=labels, predictions=tf.argmax(h, axis=-1))}
    
    # 4. Define optimizer
    optimizer = tf.train.AdamOptimizer(learning_rate=params.learning_rate)
    train_op = optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
    
    # 5. Return training/evaluation EstimatorSpec
    return tf.estimator.EstimatorSpec(mode=mode, loss=loss, train_op=train_op, eval_metric_ops=eval_metric_ops)

Setting Additional Parameters

Before we’re done, we need to set a few last variables.

params

You may have noticed in the running example model_fn that several arguments being pulled from the params dictionary. This is a perfect place to store things like the learning rate for your optimizer, the number of layers in part of your network, the number of units in a layer, etc.

I also like defining params in model.py since the parameters are logically connected to the model logic and to keep the main training script clean.

For the running example let’s closely follow AlexNet:

################################
###   Inside code/model.py   ###
################################

params = tf.contrib.training.HParams(
    layers = [
        {'type': 'conv', 'num_outputs' : 96, 'kernel_size' : 11, 'stride' : 4, 'activation' : tf.nn.relu, 'regularizer' : tf.nn.l2_loss}, 
        {'type': 'pool', 'kernel_size' : 3, 'stride' : 2},
        {'type': 'conv', 'num_outputs' : 256, 'kernel_size' : 5, 'stride' : 1, 'activation' : tf.nn.relu, 'regularizer' : tf.nn.l2_loss},
        {'type': 'pool', 'kernel_size' : 3, 'stride' : 2},
        {'type': 'conv', 'num_outputs' : 384, 'kernel_size' : 3, 'stride' : 1, 'activation' : tf.nn.relu, 'regularizer' : tf.nn.l2_loss},
        {'type': 'conv', 'num_outputs' : 384, 'kernel_size' : 3, 'stride' : 1, 'activation' : tf.nn.relu, 'regularizer' : tf.nn.l2_loss},
        {'type': 'conv', 'num_outputs' : 256, 'kernel_size' : 3, 'stride' : 1, 'activation' : tf.nn.relu, 'regularizer' : tf.nn.l2_loss},
        {'type': 'pool', 'kernel_size' : 3, 'stride' : 2},
        {'type': 'fc', 'num_outputs' : 4096, 'activation' : tf.nn.relu},
        {'type': 'fc', 'num_outputs' : 2048, 'activation' : tf.nn.relu},
        {'type': 'fc', 'num_outputs' : 50, 'activation' : None}
    ],
    learning_rate = 0.001,
    train_epochs = 30,
    batch_size = 32,
    image_height = 300,
    image_width = 200,
    image_depth = 3
)
params.add_hparam('num_layers', len(params.layers))

config

Not to be confused with params, config is a tf.estimator.RunConfig object that contains parameters that affect the Estimator while it is running such as tf_random_seed, save_summary_steps, keep_checkpoint_max, etc. Passing config to the Estimator is optional but particularly helpful for monitoring training progress.

For the running example let’s use the following to get more frequenet summary statistics written to TensorBoard.

################################
###   Inside code/model.py   ###
################################
config = tf.estimator.RunConfig(
    tf_random_seed=0,
    save_checkpoints_steps=250,
    save_checkpoints_secs=None,
    save_summary_steps=10,
)

model_dir

This one is easy - pick some output directory where you want data related to training this Estimator to be saved.

For me, these are all in a results directory <project>/results/. Make sure to add results/ to your .gitignore.

I simply name the output directory by a timestamp of when the script is run. You might modify this when testing different versions of a model (<project>/results/v1/…, <project>/results/v2/…, etc).

######################################
###   Inside code/train.py   ###
######################################

import time, datetime

ts = time.time()
timestamp = datetime.datetime.fromtimestamp(ts).strftime('%Y-%m-%d_%H-%M-%S')

model_dir_base = '../results/'
model_dir = model_dir_base + timestamp

Putting it All Together

That’s it! The imports in train.py will pick up model_fn, model_dir, config, and params and define the estimator.

Improving Reproducibility

For better reproducibility I use the following lines to copy over the main training script and the model file that defines the Estimator. This makes everything much more straight-forward when comparing several slightly varying architectures. Technically you can look up the exact architecture of the model you ran in the Graph tab of TensorBoard, but I’ll take the bet you’d rather take a quick peek at the python file you wrote than dig 4 levels into the TensorBoard graph visualization.

######################################
###   Inside code/train.py   ###
######################################

import os, shutil

# Find the path of the current running file (train script)
curr_path = os.path.realpath(__file__)
model_path = curr_path.replace('train.py', 'model.py')

# Now copy the training script and the model file to 
#   model_dir -- the same directory specified when creating the Estimator
# Note: copy over more files if there are other important dependencies.
if not os.path.exists(model_dir_base):
    os.makedirs(model_dir_base)

os.mkdir(model_dir)
shutil.copy(curr_path, model_dir)
shutil.copy(model_path, model_dir)

Tip: If you are using Jupyter Notebooks (which you should be!), calling tf.reset_default_graph() before initializing your model is a good practice. Doing this avoids creating extra variables and naming confusions. One of your cells may look like:

tf.reset_default_graph()
estimator = tf.estimator.Estimator(...)

And that’s it! The data is ready, the model is fully defined, and we are ready to start training.

Running Example: here are the complete (up to this point) train.py file and model.py file.


Continue Reading

In Part 5 we will train and evaluate the Estimator.