A while ago I posted an updated version of tensorflow’s how to read TFRecords. Today I want to share another version of this file that was created to show how to further optimize the data pipeline.
Before delving into it let me quickly reflect on TFRecords and Datasets.
TFRecords have long been tensorflow’s recommended input method (though I find that folders with images are usually preferred by people). They are made of Google Protocol Buffers stored on disk in a single file. This is advantageous, because this will store the file in one big chunk on the hard drive, meaning faster reading time on HDDs and (I believe) faster average reading time compared to classical image formats like .jpg when reading actual image data.
The Dataset API on the other hand is the new preferred format of reading data. It comes from the observation that feeding data into TF is the steepest part of the learning curve for beginners. It also unifies all the various existing methods in one approach (aka feed_dict or queues). Finally, it allows us to worry about input on a high(er) level which is always convenient.
Now, let’s get to the meat. The idea is simple: Before the pipeline was
- read a single record / example / image
- decode the record / example / image
- augment the image (not necessary in a MWE, but really important for images)
- normalize the image (again some NN wizardry that people assume you “know”)
- shuffle the examples
- batch them up for training
- use the batches for the interesting stuff
Shuffle creates a queue of single examples. This works, but is slower then it could be. If we can write the augmentation and normalization to process batches instead of images we can do this:
- read a single example
- shuffle the examples
- batch them up for training
- decode the batch
- augment the batch
- normalize the batch
- use batches for the interesting stuff
As you can see, the trick is to batch them up as soon as possible and then decode / augment in batches. I didn’t dig deeply into this, but for some reason it makes the training A LOT faster.
This is a run after the change:
u\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 650 Ti BOOST, pci bus id: 0000:01:00.0, compute capability: 3.0)
Step 0: loss = 2.32 (0.253 sec)
Step 100: loss = 2.13 (0.003 sec)
Step 200: loss = 1.90 (0.004 sec)
Step 300: loss = 1.59 (0.006 sec)
Step 400: loss = 1.16 (0.003 sec)
Step 500: loss = 0.95 (0.003 sec)
Step 600: loss = 0.84 (0.006 sec)
Step 700: loss = 0.66 (0.006 sec)
Step 800: loss = 0.79 (0.005 sec)
Step 900: loss = 0.62 (0.004 sec)
Step 1000: loss = 0.66 (0.003 sec)
Done training for 2 epochs, 1100 steps.
and for comparison a run where the batch is created at the end of the input pipeline and decoding is done first. This is what is currently implemented in the example:
2018-02-19 22:52:21.999019: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 650 Ti BOOST, pci bus id: 0000:01:00.0, compute capability: 3.0)
Step 0: loss = 2.32 (0.572 sec)
Step 100: loss = 2.13 (0.029 sec)
Step 200: loss = 1.93 (0.029 sec)
Step 300: loss = 1.65 (0.030 sec)
Step 400: loss = 1.34 (0.030 sec)
Step 500: loss = 0.93 (0.030 sec)
Step 600: loss = 0.73 (0.030 sec)
Step 700: loss = 0.68 (0.030 sec)
Step 800: loss = 0.67 (0.030 sec)
Step 900: loss = 0.56 (0.030 sec)
Step 1000: loss = 0.44 (0.029 sec)
Done training for 2 epochs, 1100 steps.
If we estimate the batched version to 0.0043 s/100ep and the single example speed to 0.03 s/100ep then we get a ~7x speedup. Pretty nice for just swapping around 2 lines of code.
Here is the code for the batched version
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Copyright 2015 The TensorFlow Authors. All Rights Reserved. | |
# | |
# Licensed under the Apache License, Version 2.0 (the "License"); | |
# you may not use this file except in compliance with the License. | |
# You may obtain a copy of the License at | |
# | |
# http://www.apache.org/licenses/LICENSE-2.0 | |
# | |
# Unless required by applicable law or agreed to in writing, software | |
# distributed under the License is distributed on an "AS IS" BASIS, | |
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | |
# See the License for the specific language governing permissions and | |
# limitations under the License. | |
# ============================================================================== | |
"""Train and Eval the MNIST network. | |
This version is like fully_connected_feed.py but uses data converted | |
to a TFRecords file containing tf.train.Example protocol buffers. | |
See: | |
https://www.tensorflow.org/programmers_guide/reading_data#reading_from_files | |
for context. | |
YOU MUST run convert_to_records before running this (but you only need to | |
run it once). | |
""" | |
from __future__ import absolute_import | |
from __future__ import division | |
from __future__ import print_function | |
import argparse | |
import os.path | |
import sys | |
import time | |
import tensorflow as tf | |
from tensorflow.examples.tutorials.mnist import mnist | |
# Basic model parameters as external flags. | |
FLAGS = None | |
# Constants used for dealing with the files, matches convert_to_records. | |
TRAIN_FILE = 'train.tfrecords' | |
VALIDATION_FILE = 'validation.tfrecords' | |
def decode(serialized_example): | |
features = tf.parse_example( | |
serialized_example, | |
# Defaults are not specified since both keys are required. | |
features={ | |
'image_raw': tf.FixedLenFeature([], tf.string), | |
'label': tf.FixedLenFeature([], tf.int64), | |
}) | |
# Convert from a scalar string tensor (whose single string has | |
# length mnist.IMAGE_PIXELS) to a uint8 tensor with shape | |
# [mnist.IMAGE_PIXELS]. | |
image = tf.decode_raw(features['image_raw'], tf.uint8) | |
#image.set_shape((mnist.IMAGE_PIXELS)) | |
# Convert label from a scalar uint8 tensor to an int32 scalar. | |
label = tf.cast(features['label'], tf.int32) | |
return image, label | |
def augment(image, label): | |
# OPTIONAL: Could reshape into a 28×28 image and apply distortions | |
# here. Since we are not applying any distortions in this | |
# example, and the next step expects the image to be flattened | |
# into a vector, we don't bother. | |
return image, label | |
def normalize(image, label): | |
# Convert from [0, 255] -> [-0.5, 0.5] floats. | |
image = tf.cast(image, tf.float32) * (1. / 255) – 0.5 | |
return image, label | |
def inputs(train, batch_size, num_epochs): | |
"""Reads input data num_epochs times. | |
Args: | |
train: Selects between the training (True) and validation (False) data. | |
batch_size: Number of examples per returned batch. | |
num_epochs: Number of times to read the input data, or 0/None to | |
train forever. | |
Returns: | |
A tuple (images, labels), where: | |
* images is a float tensor with shape [batch_size, mnist.IMAGE_PIXELS] | |
in the range [-0.5, 0.5]. | |
* labels is an int32 tensor with shape [batch_size] with the true label, | |
a number in the range [0, mnist.NUM_CLASSES). | |
This function creates a one_shot_iterator, meaning that it will only iterate | |
over the dataset once. On the other hand there is no special initialization | |
required. | |
""" | |
if not num_epochs: num_epochs = None | |
filename = os.path.join(FLAGS.train_dir, | |
TRAIN_FILE if train else VALIDATION_FILE) | |
with tf.name_scope('input'): | |
# TFRecordDataset opens a protobuf and reads entries line by line | |
# could also be [list, of, filenames] | |
dataset = tf.data.TFRecordDataset(filename) | |
dataset = dataset.repeat(num_epochs) | |
#the parameter is the queue size | |
dataset = dataset.shuffle(1000 + 3 * batch_size) | |
dataset = dataset.batch(batch_size) | |
# map takes a python function and applies it to every sample | |
dataset = dataset.map(decode) | |
dataset = dataset.map(augment) | |
dataset = dataset.map(normalize) | |
iterator = dataset.make_one_shot_iterator() | |
return iterator.get_next() | |
def run_training(): | |
"""Train MNIST for a number of steps.""" | |
# Tell TensorFlow that the model will be built into the default Graph. | |
with tf.Graph().as_default(): | |
# Input images and labels. | |
image_batch, label_batch = inputs(train=True, batch_size=FLAGS.batch_size, | |
num_epochs=FLAGS.num_epochs) | |
# Build a Graph that computes predictions from the inference model. | |
logits = mnist.inference(image_batch, | |
FLAGS.hidden1, | |
FLAGS.hidden2) | |
# Add to the Graph the loss calculation. | |
loss = mnist.loss(logits, label_batch) | |
# Add to the Graph operations that train the model. | |
train_op = mnist.training(loss, FLAGS.learning_rate) | |
# The op for initializing the variables. | |
init_op = tf.group(tf.global_variables_initializer(), | |
tf.local_variables_initializer()) | |
# Create a session for running operations in the Graph. | |
with tf.Session() as sess: | |
# Initialize the variables (the trained variables and the | |
# epoch counter). | |
sess.run(init_op) | |
try: | |
step = 0 | |
while True: #train until OutOfRangeError | |
start_time = time.time() | |
# Run one step of the model. The return values are | |
# the activations from the `train_op` (which is | |
# discarded) and the `loss` op. To inspect the values | |
# of your ops or variables, you may include them in | |
# the list passed to sess.run() and the value tensors | |
# will be returned in the tuple from the call. | |
_, loss_value = sess.run([train_op, loss]) | |
duration = time.time() – start_time | |
# Print an overview fairly often. | |
if step % 100 == 0: | |
print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, | |
duration)) | |
step += 1 | |
except tf.errors.OutOfRangeError: | |
print('Done training for %d epochs, %d steps.' % (FLAGS.num_epochs, step)) | |
def main(_): | |
run_training() | |
if __name__ == '__main__': | |
parser = argparse.ArgumentParser() | |
parser.add_argument( | |
'–learning_rate', | |
type=float, | |
default=0.01, | |
help='Initial learning rate.' | |
) | |
parser.add_argument( | |
'–num_epochs', | |
type=int, | |
default=2, | |
help='Number of epochs to run trainer.' | |
) | |
parser.add_argument( | |
'–hidden1', | |
type=int, | |
default=128, | |
help='Number of units in hidden layer 1.' | |
) | |
parser.add_argument( | |
'–hidden2', | |
type=int, | |
default=32, | |
help='Number of units in hidden layer 2.' | |
) | |
parser.add_argument( | |
'–batch_size', | |
type=int, | |
default=100, | |
help='Batch size.' | |
) | |
parser.add_argument( | |
'–train_dir', | |
type=str, | |
default='/tmp/data', | |
help='Directory with the training data.' | |
) | |
FLAGS, unparsed = parser.parse_known_args() | |
tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) |
If you test it with your machine, let me know your batch times and what machine you are using =) I’d love to hear from you.
Happy coding!