Google Inception Net-分组卷积单元(Inception Module)

这篇笔记记录Inception Net的分组卷积操作及Inception Module的作用及实现。

Google Inception Net V1首次出现在2014年，其有网络结构有22层，比同年出现的VGGNet的19层更深。Inception Net有一下特点：

同一层上使用多种卷积核
卷积核大小较小，通常只使用1x1，3x3，5x5的尺寸

Inception Module结构

其核心是分组卷积，即同一层上使用多种尺寸的卷积核，每一个卷积核得到同一层上不同尺度的特征。不同组之间的特征不交叉计算，如此便减少了计算量。

多用小尺寸的卷积核，尤其是多次用到1x1的。这是因为1x1的核性价比很高，即消耗很少的计算量就可以增加一层非线性变换。
如下图所示，实际会更灵活：

分别使用1x1, 3x3, 5x5的卷积核采样。

实现

以下是根据上图的分组卷积结构而实现的python代码：

def inception_block(x, output_channel_for_each_path, name):
    with tf.variable_scope(name): # avoid name conflict
        conv1_1 = tf.layers.conv2d(x,
                                   output_channel_for_each_path[0],
                                   (1, 1),            # 1x1 卷积核
                                   strides = (1,1),
                                   padding = 'same',
                                   activation = tf.nn.relu,
                                   name = 'conv1_1')
        conv3_3 = tf.layers.conv2d(x,
                                   output_channel_for_each_path[1],
                                   (3, 3),            #3x3 卷积核
                                   strides = (1,1),
                                   padding = 'same',
                                   activation = tf.nn.relu,
                                   name = 'conv3_3')
        conv5_5 = tf.layers.conv2d(x,
                                   output_channel_for_each_path[2],
                                   (5, 5),             # 5x5 卷积核
                                   strides = (1,1),
                                   padding = 'same',
                                   activation = tf.nn.relu,
                                   name = 'conv5_5')
        max_pooling = tf.layers.max_pooling2d(x,
                                              (2, 2),
                                              (2, 2),
                                              name = 'max_pooling')
    
    # max_pooling: output = 1/2 input, so need to add 0; 
    max_pooling_shape = max_pooling.get_shape().as_list()[1:]  # size of the output
    input_shape = x.get_shape().as_list()[1:]                  # size of the input
    
    width_padding = (input_shape[0] - max_pooling_shape[0]) // 2 # pad to width
    height_padding = (input_shape[1] - max_pooling_shape[1]) // 2 #pad to height
    
    
    padded_pooling = tf.pad(max_pooling,
                            [[0, 0],
                             [width_padding, width_padding],
                             [height_padding, height_padding],
                             [0, 0]])
    
    # put together all pieses
    concat_layer = tf.concat(
                            [conv1_1, conv3_3, conv5_5, padded_pooling],
                            axis = 3)
    return concat_layer

以上代码片段实现的是Inception V1实际会更灵活。到了Inception V2，用两个3x3代替5x5的卷积核。到了V3，将较大的二维卷积拆成一维卷积，比如将7x7的拆成1x7和7x1两个卷积。到了V4 模型结合了ResNet的残差学习块。

本笔记记录了分组卷积的作用及实现，完整的Inception Net实现看这里。