I’ve been looking at the AlphaGo:Zero network architecture  and was searching for existing implementations. I’ve found quite a few (here , here and here) with varying degrees of completeness. The cleanest is probably this one but it depends on Jupyter.
What surprised me was that I couldn’t find one that used Keras’ sequential API. While residual blocks aren’t exactly sequential, from a high level view the architecture itself is; it simply stacks (a lot of) residual blocks. So it should be possible to create something like this, right?
The answer is, of course: Yes, there isn’t much that you can’t do in Python. We are actually using this strategy already. Sequential itself inherits from Layer and, in fact, Container (a class sitting between Sequential and Layer in the inheritance hierarchy) states so itself: A Container is a directed acyclic graph of layers. It is the topological form of a “model”. A Model is simply a Container with added training routines. (source)
It works by defining the residual block as a new Keras layer. Depending on how tightly integrated you want it this can be quite short:
Inside the block we fall back to the functional way of stacking layers. If you want better integration, e.g. model.summary() showing the number of trainable weights, there is additional plumbing. Above just shows the gist . . . (gosh! That pun was bad).
Once that is written, we can use model.add( Residual(32, (3,3) )) as we would any other layer. Nice!
To close with an example, I modified the Keras CNN example on CIFAR10 and replaced the hidden convolutional layers with residual ones. I haven’t optimized performance, but you can see how it works. If you are familiar with the example, you might appreciate how similar it looks.
 Silver, David, et al. “Mastering the game of go without human knowledge.” Nature 550.7676 (2017): 354.