Easily write custom Tensorflow/Keras layers

At some point when working on deep learning models with Tensorflow/Keras for Python, you will inevitably encounter a need to use a layer type in your models that doesn't exist in the core Tensorflow/Keras for Python (from here on just simply Tensorflow) library.

I have encountered this need several times, and rather than e.g. subclassing tf.keras.Model, there's a much easier way - and if you just have a simple sequential model, you can even keep using tf.keras.Sequential with custom layers!

Background

First, some brief background on how Tensorflow is put together. The most important thing to remember is Tensorflow likes very much to compile things into native code using what we can think of as an execution graph.

In this case, by execution graph I mean a directed graph that defines the flow of information through a model or some other data processing pipeline. This is best explained with a diagram:

A simple stack of Keras layers illustrated as a directed graph

Here, we define a simple Keras AI model for classifying images which you might define with the functional API. I haven't tested this model - it's just to illustrate an example (use something e.g. like MobileNet if you want a relatively small model for image classification).

The layer stack starts at the top and works it's way downwards.

When you call model.compile(), Tensorflow complies this graph into native code for faster execution. This is important, because when you define a custom layer, you may only use Tensorflow functions to operate on the data, not Python/Numpy/etc ones.

You may have already encountered this limitation if you have defined a Tensorflow function with tf.function(some_function).

The reason for this is the specifics of how Tensorflow compiles your model. Now consider this graph:

A graph of Tensorflow functions

Basic arithmetic operations on tensors as well as more complex operators such as tf.stack, tf.linalg.matmul, etc operate on tensors as you'd expect in a REPL, but in the context of a custom layer or tf.function they operate on not a real tensor, but symbolic ones instead.

It is for this reason that when you implement a tf.function to use with tf.data.Dataset.map() for example, it only gets executed once.

Custom layers for the win!

With this in mind, we can relatively easily put together a custom layer. It's perhaps easiest to show a trivial example and then explain it bit by bit.

I recommend declaring your custom layers each in their own file.

import tensorflow as tf

class LayerMultiplier(tf.keras.layers.Layer):
    def __init__(self, multiplier=2, **kwargs):
        super(LayerMultiplier, self).__init__(**kwargs)

        self.param_multiplier = multiplier
        self.tensor_multiplier = tf.constant(multiplier, dtype=tf.float32)

    def get_config(self):
        config = super(LayerMultiplier, self).get_config()

        config.update({
            "multiplier": self.param_multiplier
        })

        return config

    def call(self, input_thing, training, **kwargs):
        return input_thing * self.tensor_multiplier

Custom layers are subclassed from tf.keras.layers.Layer. There are a few parts to a custom layer:

The constructor (__init__) works as you'd expect. You can take in custom (hyper)parameters (which should not be tensors) here and use then to control the operation of your custom layer.

get_config() must ultimately return a dictionary of arguments to pass to instantiate a new instance of your layer. This information is saved along with the model when you save a model e.g. with tf.keras.callbacks.ModelCheckpoint in .hdf5 mode, and then used when you load a model with tf.keras.models.load_model (more on loading a model with custom layers later).

A paradigm I usually adopt here is setting self.param_ARG_NAME_HERE fields in the constructor to the value of the parameters I've taken in, and then spitting them back out again in get_config().

call() is where the magic happens. This is called when you call model.compile() with a symbolic tensor which stands in for the shape of the real tensor to build an execution graph as explained above.

The first argument is always the output of the previous layer. If your layer expects multiple inputs, then this will be an array of (potentially symbolic) tensors rather then a (potentially symbolic) tensor directly.

The second argument is whether you are in training mode or not. You might not be in training mode if:

You are spinning over the validation dataset
You are making a prediction / doing inference
Your layer is frozen for some reason

Sometimes you may want to do something differently if you are in training mode vs not training mode (e.g. dataset augmentation), and Tensorflow is smart enough to ensure this is handled as you'd expect.

Note also here that I use a native multiplication with the asterisk * operator. This works because Tensorflow tensors (whether symbolic or otherwise) overload this and other operators so you don't need to call tf.math.multiply, tf.math.divide, etc explicitly yourself, which makes your code neater.

That's it, that's all you need to do to define a custom layer!

Using and saving

You can use a custom layer just like a normal one. For example, using tf.keras.Sequential:

import tensorflow as tf

from .components.LayerMultiply import LayerMultiply

def make_model(batch_size, multiplier):
    model = tf.keras.Sequential([
        tf.keras.layers.Dense(96),
        tf.keras.layers.Dense(32),
        LayerMultiply(multiplier=5)
        tf.keras.layers.Dense(10, activation="softmax"),
    ])
    model.build(input_shape=tf.TensorShape([ batch_size, 32 ]))
    model.compile(
        optimizer="Adam",
        loss=tf.keras.losses.SparseCategoricalCrossentropy(),
        metrics=[
            tf.keras.losses.SparseCategoricalAccuracy()
        ]
    )
    return model

The same goes here for the functional API. I like to put my custom layers in a components directory, but you can put them wherever you like. Again here, I haven't tested the model at all, it's just for illustrative purposes.

Saving works as normal, but for loading a saved model that uses a custom layer, you need to provide a dictionary of custom objects:

loaded_model = tf.keras.models.load_model(filepath_checkpoint, custom_objects={
    "LayerMultiply": LayerMultiply,
})

If you have multiple custom layers, define all the ones you use here. It doesn't matter if you define extra it seems, it'll just ignore the ones that aren't used.

Going further

This is far from all you can do. In custom layers, you can also:

Instantiate sublayers or models (tf.keras.Model inherits from tf.keras.layers.Layer)
Define custom trainable weights (tf.Variable)

Instantiating sublayers is very easy. Here's another example layer:

import tensorflow as tf

class LayerSimpleBlock(tf.keras.layers.Layer):
    def __init__(self, units, **kwargs):
        super(LayerSimpleBlock, self).__init__(**kwargs)

        self.param_units = units

        self.block = tf.keras.Sequential([
            tf.keras.layers.Dense(self.param_units),
            tf.keras.layers.Activation("gelu")
            tf.keras.layers.Dense(self.param_units),
            tf.keras.layers.LayerNormalization()
        ])

    def get_config(self):
        config = super(LayerSimpleBlock, self).get_config()

        config.update({
            "units": self.param_units
        })

        return config

    def call(self, input_thing, training, **kwargs):
        return self.block(input_thing, training=training)

This would work with a single sublayer too.

Custom trainable weights are also easy, but require a bit of extra background. If you're reading this post, you have probably heard of gradient descent. The specifics of how it works are out of scope of this blog post, but in short it's the underlying core algorithm deep learning models use to reduce error by stepping bit by bit towards lower error.

Tensorflow goes looking for all the weights in a model during the compilation process (see the explanation on execution graphs above) for you, and this includes custom weights.

You do, however, need to mark a tensor as a weight - otherwise Tensorflow will assume it's a static value. This is done through the use of tf.Variable:

tf.Variable(name="some_unique_name", initial_value=tf.random.uniform([64, 32]))

As far as I've seen so far, tf.Variable()s need to be defined in the constructor of a tf.keras.layers.Layer, for example:

import tensorflow as tf

class LayerSimpleBlock(tf.keras.layers.Layer):
    def __init__(self, **kwargs):
        super(LayerSimpleBlock, self).__init__(**kwargs)

        self.weight = tf.Variable(name="some_unique_name", initial_value=tf.random.uniform([64, 32]))

    def get_config(self):
        config = super(LayerSimpleBlock, self).get_config()
        return config

    def call(self, input_thing, training, **kwargs):
        return input_thing * weight

After you define a variable in the constructor, you can use it like a normal tensor - after all, in Tensorflow (and probably other deep learning frameworks too), tensors don't always have to hold an actual value at the time of execution as I explained above (I call tensors that don't contain an actual value like this symbolic tensors, since they are like stand-ins for the actual value that gets passed after the execution graph is compiled).

Conclusion

We've looked at defining custom Tensorflow/Keras layers that you can use without giving tf.keras.Sequential() or the functional API. I've shown how by compiling Python function calls into native code using an execution graph, many orders of magnitude of performance gains can be obtained, fully saturating GPU usage.

We've also touched on defining custom weights in custom layers, which can be useful depending on what you're implementing. As a side note, should you need a weight in a custom loss function, you'll need to define it in the constructor of a tf.keras.layers.Layer and then pull it out and pass it to your subclass of tf.keras.losses.Loss.

By defining custom Tensorflow/Keras layers, we can implement new cutting-edge deep learning logic that are easy to use. For example, I have implemented a Transformer with a trio of custom layers, and CBAM: Convolutional Block Attention Module also looks very cool - I might implement it soon too.

I haven't posted a huge amount about AI / deep learning on here yet, but if there's any topic (machine learning or otherwise) that you'd like me to cover, I'm happy to consider it - just leave a comment below.

Type this	To get this	Notes
`bold text`	bold text	-
`_italics text_`	italics text	-
`~~deleted text~~`	~~deleted~~	-
`code text`	`code text`	Inserts some monospaced code. It is preferred that large blocks of code are linked to using a service such as Pastebin, Github Gists or Ideone.
`> Quote`	Quote	-
`[display text](//google.com)`	display text	Inserts a hyperlink. Please use responsibly. `[rel=nofollow]` is in use and spam will be deleted.
`---`		Inserts a horizontal line. The previous line must be blank.

Stardust
Blog