Keras构建复杂模型的可行性

keras始于易，止于简。

什么意思呢？多少人是因为对keras建模过程的友好程度而上手keras，又有多少人因为keras的高度封装造成的欠灵活性而开始累觉不爱。

这里介绍一下keras的Lambda层，希望在掌握了这个trick后，能多多少少拾回些许使用keras的信心。

步入正题，Lambda，顾名思义，和python的lambda含义是类似的，这里指的是具有某种功能的layer,

keras源码里关于Lambda 的文档是这样写的，“wraps arbitrary expression as a ‘Layer’ object”,翻译成中文, “将任意的表达式包装成层对象”，我认为这句话准确，精炼的阐述了它的含义，这里的表达式就可以是你用backend/后端（tensorflow, theano)写的某个函数，也可叫做功能模块吧。

首先，我想解释一下“灵活性”这三个字，怎么理解深度学习建模过程中的灵活性, 我的理解是，灵活意味着你能够自定义你的网络层，而Lambda的出现就是为了实现这个功能的。

keras常规建模过程为：

from keras.layers import Input, Dense，Conv2D
from keras.models import Model
# This returns a tensor
inputs = Input(shape=(784,))
# a layer instance is callable on a tensor, and returns a tensor
x = Conv2D(64, activation='relu')(inputs)
x = Dense(64, activation='relu')(x)
predictions = Dense(10, activation='softmax')(x)
# This creates a model that includes
# the Input layer and three Dense layers
model = Model(inputs=inputs, outputs=predictions)
model.compile(optimizer='rmsprop',loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(data, labels) # starts training

在这样一个建模流程中，我们使用的都是Dense ,Conv2D…之类的定义好的layers,如果你想编写自己的layer就可以使用Lambda的功能了，先给出Lambda的初始化参数：

Lambda(function, output_shape=None, mask=None, arguments=None, **kwargs)

看两个例子， 1. add a x -> x^2 layer

model.add(Lambda(lambda x: x ** 2))

add a layer that returns the concatenation of the positive part of the input andthe opposite of the negative part def antirectifier(x): x -= K.mean(x, axis=1, keepdims=True) x = K.l2_normalize(x, axis=1) pos = K.relu(x) neg = K.relu(-x) return K.concatenate([pos, neg], axis=1) def antirectifier_output_shape(input_shape): shape = list(input_shape) assert len(shape) == 2 # only valid for 2D tensors shape[-1] *= 2 return tuple(shape) model.add(Lambda(antirectifier,output_shape=antirectifier_output_shape)) 注意，我们可以使用匿名函数或者是定义一个函数来作为Lambda的function参数，Lambda还有一个参数是out_shape,你可以直接指定一个tuple或者定义一个函数（输入为input_shape),我在使用的过程中，也用到过arguments这个参数，它也字典的形式出现，keys为function的参数，这样能丰富你的层的功能，这里要通过解决实际问题来加深体会。

说到这里，我们已经知道了Lambda 的一些基本使用方法，知道怎么用还不远远不够，做到会用，善用才算是比较好的掌握了，这里再谈谈我的使用Lambda的场景和经验吧。

一个是keras相较于tensorflow而言，对于一些新出的层函数（这里指Conv2D这样的layers)并没有很好的支持，但我们可以使用tensorflw去构建层，然后用Lambda 包装成keras的layers,下面为yoloV2用到的space_to_depth的代码：

def space_to_depth_x2(x):
    """Thin wrapper for Tensorflow space_to_depth with block_size=2."""
    # Import currently required to make Lambda work.
    # See: https://github.com/fchollet/keras/issues/5088#issuecomment-273851273
    import tensorflow as tf
    return tf.space_to_depth(x, block_size=2)

def space_to_depth_x2_output_shape(input_shape):
    """Determine space_to_depth output shape for block_size=2."""
    return (input_shape[0], input_shape[1] // 2, input_shape[2] // 2, 4 *input_shape[3]) 
            if input_shape[1] else (input_shape[0], None, None,4 * input_shape[3])

Lambda(space_to_depth_x2,
       output_shape=space_to_depth_x2_output_shape,
       name='space_to_depth')(conv21)

另一个使用较多的场景就是我会用Lambda 来构建我自己的custom loss-layer,再看看前面写的keras常规建模过程，当我们完成建模后，model.compile()就指定了我们的loss-function,如果你想写出自己loss-function怎么办，我这里讲一种方法，在模型最后一层输出后，你可以再用Lambda 加一层，在这层里，你可以随意添加你的loss-function,注意，这里添加了loss-layer后，你的模型输出就不再始于测值y_pred了，而直接是loss值了，所以在训练过程中，你需要的是直接用优化器最小化这个值了，因此，需要改动一下model.compile(optimizer=’adam’, loss={‘custom_loss’: lambda y_ture, y_pred : y_pred}), 这里的loss的意思是我直接将模型的output(这里的y_pred)作为我的loss,跟前边模型构建过程中添加loss-layer的思路相同，这样的构建方式在目标检测算法（比如yoloV2)中就会用到，它的loss-function比较复杂，除了分类还要回归，就可以尝试这种构建方式，还有一个问题，loss={‘custom_loss’: lambda y_ture, y_pred : y_pred}这种写法牵扯到了另外一个概念，custom loss-function,这也是定制自己loss-function的一种不错的方法。具体参看：https://github.com/keras-team/keras/issues/4126。下面给出具体的代码:

model_loss=Lambda(custom_loss,output_shape=(),name='custom_losss')(model_body.output)
model=Model(model.input,model_loss)
model.compile(optimizer='adam'，loss={'custom_loss': lambda y_true,y_pred : y_pred})
model.fit(x,y=zeros(len(datas)) # y实际上没用，就当是个占位符

总结一下，keras还算是一个比较灵活的框架，前提是你也得学会tensorflow,因为当你在自定义你的网络的时候，你其实就相当于是在使用tensorflow了，keras和tensorflow联合构建你自己的深度学习模型，是个不错的选择。当你在联合使用这两个库的时候，除了在使用他们提供的的api，你也在联合两种编程逻辑，keras的模块式，tensorfow的数据流图。

以上表述为个人理解所得，可能存在不清楚的地方，毕竟水平有限，随着经验的加深，会有更深的理解，所以还请见谅，共勉。

references: allanzelener/YAD2K https://github.com/allanzelener/YAD2K keras-team/keras https://github.com/keras-team/keras/blob/master/keras/layers/core.py

stoner