7x speedup with an optimized TensorFlow Input pipeline: TFRecords + Dataset API

A while ago I posted an updated version of tensorflow’s how to read TFRecords. Today I want to share another version of this file that was created to show how to further optimize the data pipeline.

Before delving into it let me quickly reflect on TFRecords and Datasets.

TFRecords have long been tensorflow’s recommended input method (though I find that folders with images are usually preferred by people). They are made of Google Protocol Buffers stored on disk in a single file. This is advantageous, because this will store the file in one big chunk on the hard drive, meaning faster reading time on HDDs and (I believe) faster average reading time compared to classical image formats like .jpg when reading actual image data.

The Dataset API on the other hand is the new preferred format of reading data. It comes from the observation that feeding data into TF is the steepest part of the learning curve for beginners. It also unifies all the various existing methods in one approach (aka feed_dict or queues). Finally, it allows us to worry about input on a high(er) level which is always convenient.

Now, let’s get to the meat. The idea is simple: Before the pipeline was

  1. read a single record / example / image
  2. decode the record / example / image
  3. augment the image (not necessary in a MWE, but really important for images)
  4. normalize the image (again some NN wizardry that people assume you “know”)
  5. shuffle the examples
  6. batch them up for training
  7. use the batches for the interesting stuff

Shuffle creates a queue of single examples. This works, but is slower then it could be. If we can write the augmentation and normalization to process batches instead of images we can do this:

  1. read a single example
  2. shuffle the examples
  3. batch them up for training
  4. decode the batch
  5. augment the batch
  6. normalize the batch
  7. use batches for the interesting stuff

As you can see, the trick is to batch them up as soon as possible and then decode / augment in batches. I didn’t dig deeply into this, but for some reason it makes the training A LOT faster.

This is a run after the change:

u\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 650 Ti BOOST, pci bus id: 0000:01:00.0, compute capability: 3.0)
Step 0: loss = 2.32 (0.253 sec)
Step 100: loss = 2.13 (0.003 sec)
Step 200: loss = 1.90 (0.004 sec)
Step 300: loss = 1.59 (0.006 sec)
Step 400: loss = 1.16 (0.003 sec)
Step 500: loss = 0.95 (0.003 sec)
Step 600: loss = 0.84 (0.006 sec)
Step 700: loss = 0.66 (0.006 sec)
Step 800: loss = 0.79 (0.005 sec)
Step 900: loss = 0.62 (0.004 sec)
Step 1000: loss = 0.66 (0.003 sec)
Done training for 2 epochs, 1100 steps.

and for comparison a run where the batch is created at the end of the input pipeline and decoding is done first. This is what is currently implemented in the example:

2018-02-19 22:52:21.999019: I C:\tf_jenkins\home\workspace\rel-win\M\windows-gpu\PY\36\tensorflow\core\common_runtime\gpu\gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 650 Ti BOOST, pci bus id: 0000:01:00.0, compute capability: 3.0)
Step 0: loss = 2.32 (0.572 sec)
Step 100: loss = 2.13 (0.029 sec)
Step 200: loss = 1.93 (0.029 sec)
Step 300: loss = 1.65 (0.030 sec)
Step 400: loss = 1.34 (0.030 sec)
Step 500: loss = 0.93 (0.030 sec)
Step 600: loss = 0.73 (0.030 sec)
Step 700: loss = 0.68 (0.030 sec)
Step 800: loss = 0.67 (0.030 sec)
Step 900: loss = 0.56 (0.030 sec)
Step 1000: loss = 0.44 (0.029 sec)
Done training for 2 epochs, 1100 steps.

If we estimate the batched version to 0.0043 s/100ep and the single example speed to 0.03 s/100ep then we get a ~7x speedup. Pretty nice for just swapping around 2 lines of code.

Here is the code for the batched version

If you test it with your machine, let me know your batch times and what machine you are using =) I’d love to hear from you.

Happy coding!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s