Layered Layers: Residual Blocks in the Sequential Keras API

I’ve been looking at the AlphaGo:Zero network architecture [1] and was searching for existing implementations. I’ve found quite a few (here , here and here) with varying degrees of completeness. The cleanest is probably this one but it depends on Jupyter.

What surprised me was that I couldn’t find one that used Keras’ sequential API. While residual blocks aren’t exactly sequential, from a high level view the architecture itself is; it simply stacks (a lot of) residual blocks. So it should be possible to create something like this, right?

The answer is, of course: Yes, there isn’t much that you can’t do in Python. We are actually using this strategy already. Sequential itself inherits from Layer and, in fact, Container (a class sitting between Sequential and Layer in the inheritance hierarchy) states so itself: A Container is a directed acyclic graph of layers. It is the topological form of a “model”. A Model is simply a Container with added training routines. (source)

It works by defining the residual block as a new Keras layer. Depending on how tightly integrated you want it this can be quite short:

Inside the block we fall back to the functional way of stacking layers. If you want better integration, e.g. model.summary() showing the number of trainable weights, there is additional plumbing. Above just shows the gist . . . (gosh! That pun was bad).

Once that is written, we can use model.add( Residual(32, (3,3) )) as we would any other layer. Nice!

To close with an example, I modified the Keras CNN example on CIFAR10 and replaced the hidden convolutional layers with residual ones. I haven’t optimized performance, but you can see how it works. If you are familiar with the example, you might appreciate how similar it looks.

References

[1] Silver, David, et al. “Mastering the game of go without human knowledge.” Nature 550.7676 (2017): 354.

Advertisements

Parsing TFRecords with the Tensorflow Dataset API

Update: Datasets are now part of the example in the Tensorflow library.

The Datasets API has become the new standard in feeding things into Tensorflow. Moreover, there seem to be plans to deprecate queues and other inputs, unifying the way data is fed into models. The idea now is to (1) create a Dataset object (in this case a TFRecordDataset) and then (2) create an Iterator that will extract elements and feed them into the model.

I’ve modified tensorflow’s example on “how to read data” to reflect that change. I’ve submitted a PR to the tensorflow repo, until it gets merged take a look at the new code below. It is a lot easier to read, see for yourself:

Further Reading: